akride package

Subpackages

Submodules

akride.background_task_manager module

class akride.background_task_manager.BackgroundTaskManager[source]

Bases: object

Helper class to manage background task

is_task_running(entity_id: str, task_type: BackgroundTaskType) bool[source]

:param : :type : param entity_id: Entity ID associated with the task. :param : :type : param task_type: The type of the background task.

Returns:

a boolean representing whether task is running or not.

Return type:

Boolean

start_task(entity_id: str, task_type: BackgroundTaskType, target_function, *args, **kwargs) BackgroundTask[source]

Start a background task.

:param : :type : param task_type: The type of the background task. :param : :type : param entity_id: Entity ID associated with the task :param : :type : param target_function: The target function to run :param : :type : param args: Arguments for the target function :param : :type : param kwargs: Keyword arguments for the target function

Returns:

background task object

Return type:

BackgroundTask

akride.client module

Copyright (C) 2023, Akridata, Inc - All Rights Reserved. Unauthorized copying of this file, via any medium is strictly prohibited

class akride.client.AkriDEClient(sdk_config_tuple: Tuple[str, str] | None = None, sdk_config_dict: dict | None = None, sdk_config_file: str | None = None)[source]

Bases: object

Client class to connect to DataExplorer

add_to_catalog(dataset: Dataset, table_name: str, csv_file_path: str) bool[source]

Adds new items to an existing catalog.

Parameters:
  • dataset (Dataset) – The dataset to import the catalog into.

  • table_name (str) – The name of the table to create for the catalog.

  • csv_file_path (str) – The path to the CSV file containing new catalog data.

Returns:

Indicates whether the operation was successful.

Return type:

bool

create_dataset(spec: Dict[str, Any]) Entity[source]

Creates a new dataset entity.

Parameters:

spec (Dict[str, Any]) –

The dataset spec. The spec should have the following fields:
dataset_namestr

The name of the new dataset.

dataset_namespacestr, optional

The namespace for the dataset, by default ‘default’.

data_typeDataType, optional

The type of data to store in the dataset, by default DataType.IMAGE.

glob_patternstr, optional

The glob pattern for the dataset, by default ‘*(png|jpg|gif|jpeg|tiff|tif|bmp)’.

overwritebool, optional

Overwrite if a dataset with the same name exists.

Returns:

The created entity

Return type:

Entity

create_job(spec: JobSpec) Job[source]

Creates an explore job for the specified dataset.

Parameters:

dataset: Dataset

The dataset to explore.

spec: JobSpec

The job specification.

Returns:

Job

The newly created Job object.

create_job_spec(dataset: Dataset, job_type: str | JobType = 'EXPLORE', job_name: str = '', predictions_file: str = '', cluster_algo: str | ClusterAlgoType = 'hdbscan', embed_algo: str | EmbedAlgoType = 'umap', num_clusters: int | None = None, max_images: int = 1000, catalog_table: CatalogTable | None = None, analyze_params: AnalyzeJobParams | None = None, pipeline: Pipeline | None = None, filters: List[Condition] | None = None) JobSpec[source]

Creates a JobSpec object that specifies how a job is to be created.

Parameters:

dataset: Dataset

The dataset to explore.

job_typeJobType, optional

The job type

job_namestr, optional

The name of the job to create. A unique name will be generated if this is not given.

predictions_file: str, optional

The path to the catalog file containing predictions and ground truth. This file must be formatted according to the specification at:

https://docs.akridata.ai/docs/analyze-job-creation-and-visualization

cluster_algoClusterAlgoType, optional

The clustering algorithm to use.

embed_algoEmbedAlgoType, optional

The embedding algorithm to use.

num_clustersint, optional

The number of clusters to create.

max_imagesint, optional

The maximum number of images to use.

catalog_table: CatalogTable, optional

The catalog to be used for creating this explore job. This defaults to the internal primary catalog that is created automatically when a dataset is created. default: “primary”

analyze_params: AnalyzeJobParams, optional

Analyze job related configuration parameters

filtersList[Condition], optional

The filters to be used to select a subset of samples for this job. These filters are applied to the catalog specified by catalog_name.

create_resultset(spec: Dict[str, Any]) Entity[source]

Creates a new resultset entity.

Parameters:

spec (Dict[str, Any]) –

The resultset spec. The spec should have the following fields:
job: Job

The associated job object.

namestr

The name of the new resultset.

samples: SampleInfoList

The samples to be included in this resultset.

Returns:

The created entity

Return type:

Entity

create_view(view_name: str, description: str | None, dataset: Dataset, left_table: CatalogTable, right_table: CatalogTable, join_condition: JoinCondition) str[source]

Create a SQL view for visualization

Parameters:
  • view_name (str) – Name of the view to create

  • description (Optional[str]) – Description text

  • dataset (Dataset) – Dataset object

  • left_table (TableInfo) – Left Table of the create view query

  • right_table (TableInfo) – Right Table of the create view query

  • join_condition (JoinCondition) – JoinCondition which includes the

  • table (column from the left and the right) –

Returns:

view id

Return type:

str

delete_catalog(catalog: Catalog) bool[source]

Deletes a catalog object.

Parameters:

catalog (Catalog) – The catalog object to delete.

Returns:

Indicates whether the operation was successful.

Return type:

bool

delete_dataset(dataset: Dataset) bool[source]

Deletes a dataset object.

Parameters:

dataset (Dataset) – The dataset object to delete.

Returns:

Indicates whether this entity was successfully deleted

Return type:

bool

delete_job(job: Job) bool[source]

Deletes a job object.

Parameters:

job (Job) – The job object to delete.

Returns:

Indicates whether the operation was successful.

Return type:

bool

delete_resultset(resultset: Resultset) bool[source]

Deletes a resultset object.

Parameters:

resultset (Resultset) – The resultset object to delete.

Returns:

Indicates whether the operation was successful.

Return type:

bool

get_all_columns(dataset: Dataset, table: CatalogTable) List[Column][source]

Returns all columns for a table/view

Parameters:
  • dataset (Dataset) – Dataset object

  • table (TableInfo) – Table Information

Returns:

List of columns of the table

Return type:

List[Column]

get_attached_pipelines(dataset: Dataset, version: str | None = None) List[Pipeline][source]

Get pipelines attached for dataset given a dataset version

Parameters:
  • dataset (Dataset) – Dataset object

  • version (str, optional) – Dataset version. Defaults to None in which

  • used (case the latest version would be) –

Returns:

List of pipelines attached with the dataset

Return type:

List[Pipeline]

get_catalog_by_name(dataset: Dataset, name: str) Entity | None[source]

Retrieves a catalog with the given name.

Parameters:
  • dataset (Dataset) – The dataset to retrieve the catalog from.

  • name (str) – The name of the catalog to retrieve.

Returns:

The Entity object representing the catalog.

Return type:

Entity

get_catalog_tags(samples: SampleInfoList) DataFrame[source]

Retrieves the catalog tags corresponding to the given samples.

Parameters:

samples (SampleInfoList) – The samples to retrieve catalog tags for.

Returns:

A dataframe of catalog tags.

Return type:

pd.DataFrame

get_catalogs(attributes: Dict[str, Any] = {}) List[Entity][source]

Retrieves information about catalogs that have the given attributes.

Parameters:

attributes (Dict[str, Any]) –

The filter specification. It may have the following optional fields:

namestr

filter by catalog name

statusstr

filter by catalog status, can be one of “active”,”inactive”, “refreshing”, “offline”, “invalid-config”

Returns:

A list of Entity objects representing catalogs.

Return type:

List[Entity]

get_dataset_by_name(name: str) Entity | None[source]

Retrieves a dataset with the given name.

Parameters:

name (str) – The name of the dataset to retrieve.

Returns:

The Entity object representing the dataset.

Return type:

Entity

get_datasets(attributes: Dict[str, Any] = {}) List[Entity][source]

Retrieves information about datasets that have the given attributes.

Parameters:

attributes (Dict[str, Any], optional) –

The filter specification. It may have the following optional fields:

search_keystr

Filter across fields like dataset id, and dataset name.

Returns:

A list of Entity objects representing datasets.

Return type:

List[Entity]

get_fullres_image_urls(samples: SampleInfoList) Dict[source]

Retrieves the full-resolution image urls for the give samples.

Parameters:

samples (SampleInfoList) – The samples to retrieve full res image urls for.

Returns:

A dictionary containing the full-resolution image URLs for each sample.

Return type:

Dict

get_fullres_images(samples: SampleInfoList) List[Image][source]

Retrieves the full-resolution images for the provided job.

Parameters:

samples (SampleInfoList) – The samples to retrieve images for.

Returns:

A list of images.

Return type:

List[Image.Image]

get_job_by_name(name: str) Job[source]

Retrieves a job with the given name.

Parameters:

name (str) – The name of the job to retrieve.

Returns:

The Entity object representing the job.

Return type:

Entity

get_job_display_panel(job: Job) str[source]

Retrieves the job panel URI the Data Explorer.

Parameters:

job (Job) – The Job object to be queried.

Returns:

The job panel URL.

Return type:

str

get_job_samples(job: Job, job_context: JobContext, spec: SimilaritySearchSpec | ConfusionMatrixCellSpec | ClusterRetrievalSpec | CoresetSamplingSpec, **kwargs) SampleInfoList[source]

Retrieves the samples according to the given specification.

Parameters:
  • job (Job) – The Job object to get samples for.

  • job_context (JobContext) – The context in which the samples are requested for.

  • spec (Union[) – SimilaritySearchSpec, ConfusionMatrixCellSpec, ClusterRetrievalSpec, CoresetSamplingSpec

  • ] – The job context spec.

  • **kwargs (Additional keyword arguments) –

  • arguments (Supported keyword) –

    iou_config_threshold: float, optional

    Threshold value for iou config

    confidence_score_threshold: float, optional

    Threshold value for confidence score

Returns:

A SampleInfoList object.

Return type:

SampleInfoList

get_job_samples_from_file_path(job: Job, file_info: List[str]) Dict[source]

Retrieves the samples according to the given specification.

Parameters:
  • job (Job) – The Job object to get samples for. The job context spec.

  • file_info (List[str]) – List of file_paths for the images of interest

Returns:

dictionary of map between file_path and point_ids

Return type:

Dict

get_job_statistics(job: Job, context: JobStatisticsContext, **kwargs) JobStatistics[source]

Retrieves statistics info from an analyze job.

Parameters:
  • job (Job) – The Job object to get statistics for.

  • context (JobStatisticsContext) – The type of statistics to retrieve.

  • **kwargs (Additional keyword arguments) –

  • arguments (Supported keyword) –

    iou_config_threshold: float, optional

    Threshold value for iou config

    confidence_score_threshold: float, optional

    Threshold value for confidence score

Returns:

A job statistics object.

Return type:

JobStatistics

get_jobs(attributes: Dict[str, Any] = {}) List[Entity][source]

Retrieves information about jobs that have the given attributes.

Parameters:

attributes (Dict[str, Any]) –

The filter specification. It may have the following optional fields:

data_typestr

The data type to filter on. This can be ‘IMAGE’ or ‘VIDEO’.

job_typestr

The job type to filter on - ‘EXPLORE’, ‘ANALYZE’ etc.

search_keystr

Filter jobs across fields like job name, dataset id, and dataset name.

Returns:

A list of Entity objects representing jobs.

Return type:

List[Entity]

get_progress_info(task: BackgroundTask) ProgressInfo[source]

Gets the progress of the specified task.

Parameters:

task (BackgroundTask) – The task object to retrieve the progress information for.

Returns:

The progress information

Return type:

ProgressInfo

get_resultset_by_name(name: str) Entity | None[source]

Retrieves a resultset with the given name.

Parameters:

name (str) – The name of the resultset to retrieve.

Returns:

The Entity object representing the resultset.

Return type:

Entity

get_resultset_samples(resultset: Resultset) SampleInfoList[source]

Retrieves the samples of a resultset

Parameters:

resultset (Resultset) – The Resultset object to get samples for.

Returns:

A SampleInfoList object.

Return type:

SampleInfoList

get_resultsets(attributes: Dict[str, Any] = {}) List[Entity][source]

Retrieves information about resultsets that have the given attributes.

Parameters:

attributes (Dict[str, Any], optional) –

The filter specification. It may have the following optional fields:

search_keystr

Filter across fields like dataset id, and dataset name.

Returns:

A list of Entity objects representing resultsets.

Return type:

List[Entity]

get_server_version() str[source]

Get Dataexplorer server version

Returns:

server version

Return type:

str

get_thumbnail_images(samples: SampleInfoList) List[Image][source]

Retrieves the thumbnail images corresponding to the samples.

Parameters:

samples (SampleInfoList) – The samples to retrieve thumbnails for.

Returns:

A list of thumbnail images.

Return type:

List[Image.Image]

import_catalog(dataset: Dataset, table_name: str, csv_file_path: str, create_view: bool = True, file_name_column: str | None = None, pipeline_name: str | None = None) bool[source]

Method for importing an external catalog into a dataset.

Parameters:
  • dataset (Dataset) – The dataset to import the catalog into.

  • table_name (str) – The name of the table to create for the catalog.

  • csv_file_path (str) – The path to the CSV file containing the catalog data.

  • create_view (bool default: True) – Create a view with imported catalog and primary catalog table

  • file_name_column (str) – Name of the column in the csv file that contains the absolute filename

  • pipeline_name (str) – Name of pipeline whose primary table will be joined with the imported table. Ignored if create_view is false

Returns:

Indicates whether the operation was successful.

Return type:

bool

ingest_dataset(dataset: Dataset, data_directory: str, use_patch_featurizer: bool = True, async_req: bool = False, catalog_details: CatalogDetails | None = None) BackgroundTask | None[source]

Starts an asynchronous ingest task for the specified dataset.

Parameters:
  • dataset (Dataset) – The dataset to ingest.

  • data_directory (str) – The path to the directory containing the dataset files.

  • use_patch_featurizer (bool, optional) – Ingest dataset to enable patch-based similarity searches.

  • async_req (bool, optional) – Whether to execute the request asynchronously.

  • catalog_details (Optional[CatalogDetails]) – Parameters details for creating a catalog

Returns:

A task object

Return type:

BackgroundTask

update_resultset(resultset: Resultset, add_list: SampleInfoList | None = None, del_list: SampleInfoList | None = None) bool[source]

Updates a resultset.

Parameters:
  • resultset (Resultset) – The resultset to be updated.

  • add_list (SampleInfoList, optional) – The list of samples to be added.

  • del_list (SampleInfoList, optional) – The list of samples to be deleted.

Returns:

Indicates whether the operation was successful.

Return type:

bool

wait_for_completion(task: BackgroundTask) ProgressInfo[source]

Waits for the specified task to complete.

Parameters:

task (BackgroundTask) – The ID of the job to wait for.

Returns:

The progress information

Return type:

ProgressInfo

Module contents

akride.init(sdk_config_tuple: Tuple[str, str] | None = None, sdk_config_dict: dict | None = None, sdk_config_file: str | None = '') AkriDEClient[source]

Initializes the AkriDEClient with the saas_endpoint and api_key values The init params could be passed in different ways, incase multiple options are used to pass the init params the order of preference would be 1. sdk_config_tuple, 2. sdk_config 3. sdk_config_file

Get the config by signing in to Data Explorer UI and navigating to Utilities → Get CLI/SDK config :param sdk_config_tuple: A tuple consisting of saas_endpoint and api_key in that order :type sdk_config_tuple: tuple :param sdk_config_dict: dictionary containing “saas_endpoint” and “api_key” :type sdk_config_dict: dict :param sdk_config_file: Path to the the SDK config file downloaded from Dataexplorer :type sdk_config_file: str

Raises: