akride package

Subpackages

akride.core package

Submodules

akride.background_task_manager module

class akride.background_task_manager.BackgroundTaskManager[source]

Bases: object

Helper class to manage background task

is_task_running(entity_id: str, task_type: BackgroundTaskType) → bool[source]

:param : :type : param entity_id: Entity ID associated with the task. :param : :type : param task_type: The type of the background task.

Returns:: a boolean representing whether task is running or not.
Return type:: Boolean

start_task(entity_id: str, task_type: BackgroundTaskType, target_function, *args, **kwargs) → BackgroundTask[source]

Start a background task.

:param : :type : param task_type: The type of the background task. :param : :type : param entity_id: Entity ID associated with the task :param : :type : param target_function: The target function to run :param : :type : param args: Arguments for the target function :param : :type : param kwargs: Keyword arguments for the target function

Returns:: background task object
Return type:: BackgroundTask

akride.client module

Bases: object

Client class to connect to DataExplorer

abort_bgc_jobs(dataset: Dataset, job: BGCJob | None = None)[source]

Aborts background cataloging jobs for the dataset

Parameters:

dataset (Dataset) – The dataset object to submit ingestion.
job (Optional[BGCJob]) – The background catalog job object

Return type:

None

add_to_catalog(dataset: Dataset, table_name: str, csv_file_path: str, import_identifier: str | None = None) → bool[source]

Adds new items to an existing catalog.

Parameters:

dataset (Dataset) – The dataset to import the catalog into.
table_name (str) – The name of the table to create for the catalog.
csv_file_path (str) – The path to the CSV file containing new catalog data.
import_identifier (str) – Unique identifier for importing data

Returns:

Indicates whether the operation was successful.

Return type:

bool

attach_pipeline_to_dataset(pipeline_id, dataset_id, attachment_policy_type: AttachmentPolicyType | None = 'ON_DEMAND')[source]

Attach pipeline based on a

Parameters:

dataset_id (str) – The dataset id representing a dataset
pipeline_id (str) – The pipeline id representing a docker pipeline
attachment_policy_type (Optional[AttachmentPolicyType]) – Pipeline attachment policy type, by default “ON_DEMAND”

Return type:

None

attach_pipelines(dataset: Dataset, featurizer_types: Set[FeaturizerType], attachment_policy_type: AttachmentPolicyType | None = 'PUSH_MODE')[source]

Attach pipelines based on the featurizer types

Parameters:

dataset (Dataset) – The dataset object to submit ingestion.
featurizer_types (Set[FeaturizerType]) – Featurizers to run for the dataset
attachment_policy_type (Optional[AttachmentPolicyType]) – Pipeline attachment policy type

Return type:

None

check_if_dataset_files_to_be_registered(dataset: Dataset, file_paths: List[str]) → bool[source]

Check if the files are not registered for the dataset

Parameters:

dataset (Dataset) – The dataset object
file_paths (List[str]) – New files to register for the dataset

Returns:

bool

Return type:

Indicates if files need to be registered

create_dataset(spec: Dict[str, Any]) → Entity[source]

Creates a new dataset entity.

Parameters:

spec (Dict[str, Any]) –

The dataset spec. The spec should have the following fields:

dataset_namestr: The name of the new dataset.
dataset_namespacestr, optional: The namespace for the dataset, by default ‘default’.
data_typeDataType, optional: The type of data to store in the dataset, by default DataType.IMAGE.
glob_patternstr, optional: The glob pattern for the dataset, by default For image datasets: value =’*(png|jpg|gif|jpeg|tiff|tif|bmp)’. For video datasets: value = ‘*(mov|mp4|avi|wmv|mpg|mpeg|mkv)’
sample_frame_rate: float, optional: The frame rate per second (fps) for videos. Applicable only for video datasets.
overwritebool, optional: Overwrite if a dataset with the same name exists.

Returns:

The created entity

Return type:

Entity

create_docker_pipeline(spec: DockerPipelineSpec) → DockerPipeline | None[source]

Creates a Pipeline using the Docker Image

specDockerPipelineSpec
Pipeline Specification

DockerPipeline
object representing the Docker Pipeline

create_featurizer_image_spec(image_name: str, description: str, command: str, repository_name: str, properties: Dict[str, Any], gpu_filter: bool | None = None, gpu_mem_fraction: float | None = None, allow_no_gpu: bool | None = None, namespace: str | None = 'default', image_tag: str | None = 'latest', name: str | None = None) → DockerImageSpec[source]

Creates a DockerImageSpec object that specifies the Featurizer Docker Image to be created

Parameters:

image_namestr: The name of the Docker Image present in the repository
descriptionstr: A short description of the Docker Image
command: str: Command that is used to run the featurizer docker
repository_name: str: Name of the repository in DE, the Docker Image will be pulled from.
properties: Dict[str, Any]: Properties specific to the Docker Image
gpu_filter: Optional[bool]: Flag to specify if the Image can be on a GPU or not
gpu_mem_fraction: Optional[float]: The GPU specifying the memory to be reserved for the Docker Image. Should be > 0 and <= 1
allow_no_gpu: Optional[bool]: Flag to specify if the Image can also be run if no GPU is available
namespace: Optional[str]: Namespace of the Docker Image, By default it will be ‘default’
image_tag: Optional[str]: Tag of the docker Image in the docker repository, be default it will be “latest”
name: Optional[str]: Display name of the Docker Image on DE, by default it will be same as image_name

returns:: Object representing a Docker Image Specification
rtype:: DockerImageSpec

create_featurizer_pipeline_spec(pipeline_name: str, pipeline_description: str, featurizer_name: str, data_type: str | None = DataType.IMAGE, namespace: str | None = 'default') → DockerPipelineSpec[source]

Creates a DockerImageSpec object that specifies the Featurizer Docker Image to be created

Parameters:

pipeline_namestr: The name of the Docker pipeline
pipeline_descriptionstr: A short description of the Docker Pipeline
featurizer_name: str: Docker Image name of the featurizer to uniquely identify the image.
data_type: Optional[str]: Data Type of the pipeline, by default DataType.IMAGE. Allowed values are DataType.IMAGE, DataType.VIDEO
namespace: Optional[str]: Namespace of the Docker Pipeline, By default it will be ‘default’

returns:: Object representing a Docker Pipeline Specification
rtype:: DockerPipelineSpec

create_job(spec: JobSpec) → Job[source]

Creates an explore job for the specified dataset.

Parameters:

dataset: Dataset: The dataset to explore.
spec: JobSpec: The job specification.

Returns:

Job: The newly created Job object.

create_job_spec(dataset: Dataset, job_type: str | JobType = 'EXPLORE', job_name: str = '', predictions_file: str = '', cluster_algo: str | ClusterAlgoType = ClusterAlgoType.HDBSCAN, embed_algo: str | EmbedAlgoType = EmbedAlgoType.UMAP, num_clusters: int | None = None, max_images: int = 1000, catalog_table: CatalogTable | None = None, analyze_params: AnalyzeJobParams | None = None, pipeline: Pipeline | None = None, filters: List[Condition] | None = None, reference_job: Job | None = None) → JobSpec[source]

Creates a JobSpec object that specifies how a job is to be created.

Parameters:

dataset: Dataset: The dataset to explore.
job_typeJobType, optional: The job type
job_namestr, optional: The name of the job to create. A unique name will be generated if this is not given.
predictions_file: str, optional: The path to the catalog file containing predictions and ground truth. This file must be formatted according to the specification at:

https://docs.akridata.ai/docs/analyze-job-creation-and-visualization
cluster_algoClusterAlgoType, optional: The clustering algorithm to use.
embed_algoEmbedAlgoType, optional: The embedding algorithm to use.
num_clustersint, optional: The number of clusters to create.
max_imagesint, optional: The maximum number of images to use.
catalog_table: CatalogTable, optional: The catalog to be used for creating this explore job. This defaults to the internal primary catalog that is created automatically when a dataset is created. default: “primary”
analyze_params: AnalyzeJobParams, optional: Analyze job related configuration parameters
filtersList[Condition], optional: The filters to be used to select a subset of samples for this job. These filters are applied to the catalog specified by catalog_name.
reference_job: Job, optional: The reference job for this compare job

create_resultset(spec: Dict[str, Any]) → Entity[source]

Creates a new resultset entity.

Parameters:

spec (Dict[str, Any]) –

The resultset spec. The spec should have the following fields:

job: Job: The associated job object.
namestr: The name of the new resultset.
samples: SampleInfoList: The samples to be included in this resultset.

Returns:

The created entity

Return type:

Entity

create_table(dataset: Dataset, table_name: str, schema: Dict[str, str], indices: List[str] | None = None) → str[source]

Adds and empty external catalog to the dataset.

Parameters:

dataset (Dataset) – The dataset to create the catalog in.
table_name (str) – The name of the table to create for the catalog.
schema (Dict[str, str]) –

The schema of the external catalog table
in the format {col_name: col_type}

Returns:

Returns the absolute table name for the external catalog.

Return type:

str

create_view(view_name: str, description: str | None, dataset: Dataset, left_table: CatalogTable, right_table: CatalogTable, join_condition: JoinCondition, inner_join: bool = False) → str[source]

Create a SQL view for visualization Note: Left join is used by default while creating the view

Parameters:

view_name (str) – Name of the view to create
description (Optional[str]) – Description text
dataset (Dataset) – Dataset object
left_table (TableInfo) – Left Table of the create view query
right_table (TableInfo) – Right Table of the create view query
join_condition (JoinCondition) – JoinCondition which includes the
table (column from the left and the right) –
inner_join (bool) – Use inner join for joining the tables

Returns:

view id

Return type:

str

delete_catalog(catalog: Catalog) → bool[source]

Deletes a catalog object.

Parameters:: catalog (Catalog) – The catalog object to delete.
Returns:: Indicates whether the operation was successful.
Return type:: bool

delete_dataset(dataset: Dataset) → bool[source]

Deletes a dataset object.

Parameters:: dataset (Dataset) – The dataset object to delete.
Returns:: Indicates whether this entity was successfully deleted
Return type:: bool

delete_job(job: Job) → bool[source]

Deletes a job object.

Parameters:: job (Job) – The job object to delete.
Returns:: Indicates whether the operation was successful.
Return type:: bool

delete_resultset(resultset: Resultset) → bool[source]

Deletes a resultset object.

Parameters:: resultset (Resultset) – The resultset object to delete.
Returns:: Indicates whether the operation was successful.
Return type:: bool

get_all_columns(dataset: Dataset, table: CatalogTable) → List[Column][source]

Returns all columns for a table/view

Parameters:

dataset (Dataset) – Dataset object
table (TableInfo) – Table Information

Returns:

List of columns of the table

Return type:

List[Column]

get_attached_pipelines(dataset: Dataset, version: str | None = None) → List[Pipeline][source]

Get pipelines attached for dataset given a dataset version

Parameters:

dataset (Dataset) – Dataset object
version (str, optional) – Dataset version. Defaults to None in which
used (case the latest version would be) –

Returns:

List of pipelines attached with the dataset

Return type:

List[Pipeline]

get_bgc_attached_pipeline_progress_report(dataset: Dataset, pipeline: Pipeline) → BGCAttachmentJobStatus[source]

Get Background Catalog progress for the dataset attachment

Parameters:

dataset (Dataset) – The dataset object to retrieve background catalog jobs
pipeline (Pipeline) – The pipeline object which is attached to dataset

Returns:

Background Catalog status for the dataset attachment

Return type:

BGCAttachmentJobStatus

get_bgc_job_by_id(job_id: str) → BGCJob[source]

Get BGC job by the job id

Parameters:: job_id (str) – Job id of the triggered BGC job
Returns:: The background Catalog object
Return type:: BGCJob

get_catalog_by_name(dataset: Dataset, name: str) → Entity | None[source]

Retrieves a catalog with the given name.

Parameters:

dataset (Dataset) – The dataset to retrieve the catalog from.
name (str) – The name of the catalog to retrieve.

Returns:

The Entity object representing the catalog.

Return type:

Entity

get_catalog_data_count(dataset: Dataset, table_name: str, filter_str: str | None = None) → int[source]

Retrieves the count of the number of rows in a catalog table based on filters

Parameters:

dataset (Dataset) – The dataset to import the catalog into.
table_name (str) – The catalog table name
filter_str (str) – Filter the rows based on values

Returns:

The number of rows filtered

Return type:

int

get_catalog_tags(samples: SampleInfoList) → DataFrame[source]

Retrieves the catalog tags corresponding to the given samples.

Parameters:: samples (SampleInfoList) – The samples to retrieve catalog tags for.
Returns:: A dataframe of catalog tags.
Return type:: pd.DataFrame

get_catalogs(attributes: Dict[str, Any] = {}) → List[Entity][source]

Retrieves information about catalogs that have the given attributes.

Parameters:

attributes (Dict[str, Any]) –

The filter specification. It may have the following optional fields:

namestr
filter by catalog name

statusstr
filter by catalog status, can be one of “active”,”inactive”, “refreshing”, “offline”, “invalid-config”

Returns:

A list of Entity objects representing catalogs.

Return type:

List[Entity]

get_compatible_reference_jobs(dataset: Dataset, pipeline: Pipeline, catalog_table: CatalogTable, search_key: str | None = None) → List[Job][source]

Retrieves jobs created from a given catalog_table which can be used to create “JobType.COMPARE” job types

Parameters:

dataset (Dataset) – The dataset to explore.
pipeline (Pipeline) – The pipeline to use.
catalog_table – The catalog table to use for creating compare job.
search_key (str) – Filter jobs across fields like job name

Returns:

A list of Entity objects representing jobs.

Return type:

List[Entity]

get_containers(attributes: Dict[str, Any] | None = None) → List[Entity][source]

Retrieves information about containers that have the given attributes.

Parameters:

attributes (Dict[str, Any], optional) –

The filter specification. It may have the following optional fields:

filter_by_name: str
Filter by container name.

search_by_namestr
Search by container name.

Returns:

A list of Entity objects representing containers.

Return type:

List[Entity]

get_dataset_by_name(name: str) → Entity | None[source]

Retrieves a dataset with the given name.

Parameters:: name (str) – The name of the dataset to retrieve.
Returns:: The Entity object representing the dataset.
Return type:: Entity

get_datasets(attributes: Dict[str, Any] = {}) → List[Entity][source]

Retrieves information about datasets that have the given attributes.

Parameters:

attributes (Dict[str, Any], optional) –

The filter specification. It may have the following optional fields:

search_keystr
Filter across fields like dataset id, and dataset name.

Returns:

A list of Entity objects representing datasets.

Return type:

List[Entity]

get_docker_image(name: str) → Entity | None[source]

Retrieves a Docker Image with the given name.

Parameters:: name (str) – The name of the Docker Image to retrieve
Returns:: The Entity object representing the Docker Image.
Return type:: Entity

get_files_to_be_processed(dataset: Dataset, pipeline: Pipeline, batch_size: int) → DatasetUnprocessedFiles[source]

Get files to be processed for the dataset

Parameters:

dataset (Dataset) – The dataset object
pipeline (Pipeline) – The associated pipeline for which the files have to be obtained
batch_size (int) – Number of files to be retrieved

Returns:

Dataset files to be processed.

Return type:

DatasetUnprocessedFiles

get_fullres_image_urls(samples: SampleInfoList) → Dict[source]

Retrieves the full-resolution image urls for the give samples.

Parameters:: samples (SampleInfoList) – The samples to retrieve full res image urls for.
Returns:: A dictionary containing the full-resolution image URLs for each sample.
Return type:: Dict

get_fullres_images(samples: SampleInfoList) → List[Image][source]

Retrieves the full-resolution images for the provided job.

Parameters:: samples (SampleInfoList) – The samples to retrieve images for.
Returns:: A list of images.
Return type:: List[Image.Image]

get_job_by_name(name: str) → Job[source]

Retrieves a job with the given name.

Parameters:: name (str) – The name of the job to retrieve.
Returns:: The Entity object representing the job.
Return type:: Entity

get_job_display_panel(job: Job) → str[source]

Retrieves the job panel URI the Data Explorer.

Parameters:: job (Job) – The Job object to be queried.
Returns:: The job panel URL.
Return type:: str

get_job_samples(job: Job, job_context: JobContext, spec: SimilaritySearchSpec | ConfusionMatrixCellSpec | ClusterRetrievalSpec | CoresetSamplingSpec, **kwargs) → SampleInfoList[source]

Retrieves the samples according to the given specification.

Parameters:

job (Job) – The Job object to get samples for.
job_context (JobContext) – The context in which the samples are requested for.
spec (Union[) – SimilaritySearchSpec, ConfusionMatrixCellSpec, ClusterRetrievalSpec, CoresetSamplingSpec
] – The job context spec.
**kwargs (Additional keyword arguments) –
arguments (Supported keyword) –

iou_config_threshold: float, optional
Threshold value for iou config

confidence_score_threshold: float, optional
Threshold value for confidence score

Returns:

A SampleInfoList object.

Return type:

SampleInfoList

get_job_samples_from_file_path(job: Job, file_info: List[str]) → Dict[source]

Retrieves the samples according to the given specification.

Parameters:

job (Job) – The Job object to get samples for. The job context spec.
file_info (List[str]) – List of file_paths for the images of interest

Returns:

dictionary of map between file_path and point_ids

Return type:

Dict

get_job_statistics(job: Job, context: JobStatisticsContext, **kwargs) → JobStatistics[source]

Retrieves statistics info from an analyze job.

Parameters:

job (Job) – The Job object to get statistics for.
context (JobStatisticsContext) – The type of statistics to retrieve.
**kwargs (Additional keyword arguments) –
arguments (Supported keyword) –

iou_config_threshold: float, optional
Threshold value for iou config

confidence_score_threshold: float, optional
Threshold value for confidence score

Returns:

A job statistics object.

Return type:

JobStatistics

get_jobs(attributes: Dict[str, Any] = {}) → List[Entity][source]

Retrieves information about jobs that have the given attributes.

Parameters:

attributes (Dict[str, Any]) –

The filter specification. It may have the following optional fields:

data_typestr
The data type to filter on. This can be ‘IMAGE’ or ‘VIDEO’.

job_typestr
The job type to filter on - ‘EXPLORE’, ‘ANALYZE’ etc.

search_keystr
Filter jobs across fields like job name, dataset id, and dataset name.

Returns:

A list of Entity objects representing jobs.

Return type:

List[Entity]

get_progress_info(task: BackgroundTask) → ProgressInfo[source]

Gets the progress of the specified task.

Parameters:: task (BackgroundTask) – The task object to retrieve the progress information for.
Returns:: The progress information
Return type:: ProgressInfo

get_repository_by_name(name: str) → Entity | None[source]

Retrieves a Docker repository with the given name.

Parameters:: name (str) – The name of the Repository to retrieve.
Returns:: Object representing the Docker Repository.
Return type:: Entity

get_resultset_by_id(resultset_id: str) → Entity[source]

Retrieves a resultset with the given identifier.

Parameters:: name (str) – The name of the resultset to retrieve.
Returns:: The Entity object representing the resultset.
Return type:: Entity

get_resultset_by_name(name: str) → Entity | None[source]

Retrieves a resultset with the given name.

Parameters:: name (str) – The name of the resultset to retrieve.
Returns:: The Entity object representing the resultset.
Return type:: Entity

get_resultset_samples(resultset: Resultset, max_sample_size: int = 10000) → SampleInfoList[source]

Retrieves the samples of a resultset

Parameters:: resultset (Resultset) – The Resultset object to get samples for.
Returns:: A SampleInfoList object.
Return type:: SampleInfoList

get_resultsets(attributes: Dict[str, Any] = {}) → List[Entity][source]

Retrieves information about resultsets that have the given attributes.

Parameters:

attributes (Dict[str, Any], optional) –

The filter specification. It may have the following optional fields:

search_keystr
Filter across fields like dataset id, and dataset name.

Returns:

A list of Entity objects representing resultsets.

Return type:

List[Entity]

get_secrets(name: str, namespace: str) → SMSSecrets | None[source]

Retrieves information about SMS Secret for the given SMS secret name and namespace.

Parameters:

name (str) – Filter across SMS Secret Key
namespace (str) – Filter across SMS Secret Namespace

Returns:

Object representing Secrets.

Return type:

SMSSecrets

get_server_version() → str[source]

Get Dataexplorer server version

Returns:: server version
Return type:: str

get_thumbnail_images(samples: SampleInfoList) → List[Image][source]

Retrieves the thumbnail images corresponding to the samples.

Parameters:: samples (SampleInfoList) – The samples to retrieve thumbnails for.
Returns:: A list of thumbnail images.
Return type:: List[Image.Image]

get_view_id(dataset: Dataset, view_name: str) → CatalogViewInfo | None[source]

Retrieves the view id for a view of a dataset

Parameters:

dataset (Dataset) – The dataset to get the view id from
view_name (str) – The name of the view, to get the id

Returns:

Returns the CatalogViewInfo object

Return type:

Optional[CatalogViewInfo]

import_catalog(dataset: Dataset, table_name: str, csv_file_path: str, create_view: bool = True, file_name_column: str | None = None, pipeline_name: str | None = None, import_identifier: str | None = None) → bool[source]

Method for importing an external catalog into a dataset.

Parameters:

dataset (Dataset) – The dataset to import the catalog into.
table_name (str) – The name of the table to create for the catalog.
csv_file_path (str) – The path to the CSV file containing the catalog data.
create_view (bool default: True) – Create a view with imported catalog and primary catalog table
file_name_column (str) – Name of the column in the csv file that contains the absolute filename
pipeline_name (str) – Name of pipeline whose primary table will be joined with the imported table. Ignored if create_view is false
import_identifier (str) – Unique identifier for importing data

Returns:

Indicates whether the operation was successful.

Return type:

bool

ingest_dataset(dataset: Dataset, data_directory: str, use_patch_featurizer: bool = True, with_clip_featurizer: bool = False, async_req: bool = False, catalog_details: CatalogDetails | None = None) → BackgroundTask | None[source]

Starts an asynchronous ingest task for the specified dataset.

Parameters:

dataset (Dataset) – The dataset to ingest.
data_directory (str) – The path to the directory containing the dataset files.
use_patch_featurizer (bool, optional) – Ingest dataset to enable patch-based similarity searches.
with_clip_featurizer (bool, optional) – Ingest dataset to enable text prompt based search.
async_req (bool, optional) – Whether to execute the request asynchronously.
catalog_details (Optional[CatalogDetails]) – Parameters details for creating a catalog

Returns:

A task object

Return type:

BackgroundTask

publish_resultset(resultset: Resultset) → bool[source]

Publishes a resultset.

Parameters:: resultset (Resultset) – The resultset to be published.
Returns:: Indicates whether the operation was successful.
Return type:: bool

register_docker_image(spec: DockerImageSpec) → DockerImage | None[source]

Registers a Docker Image

specDockerImageSpec
Docker Image Specification

DockerImage
Object representing the Docker Image

submit_bgc_job(dataset: Dataset, pipelines: List[Pipeline]) → BGCJob[source]

Submits a Background Cataloging Job for the dataset

Parameters:

dataset (Dataset) – The dataset object to submit ingestion.
pipelines (List[Pipeline]) – Pipelines to run for the job

Returns:

The background Catalog object

Return type:

BGCJob

update_resultset(resultset: Resultset, add_list: SampleInfoList | None = None, del_list: SampleInfoList | None = None) → bool[source]

Updates a resultset.

Parameters:

resultset (Resultset) – The resultset to be updated.
add_list (SampleInfoList, optional) – The list of samples to be added.
del_list (SampleInfoList, optional) – The list of samples to be deleted.

Returns:

Indicates whether the operation was successful.

Return type:

bool

wait_for_completion(task: BackgroundTask) → ProgressInfo[source]

Waits for the specified task to complete.

Parameters:: task (BackgroundTask) – The ID of the job to wait for.
Returns:: The progress information
Return type:: ProgressInfo

akride.main module

akride.main.main()[source]

Module contents

akride.init(sdk_config_tuple: Tuple[str, str] | None = None, sdk_config_dict: dict | None = None, sdk_config_file: str | None = '') → AkriDEClient[source]

Initializes the AkriDEClient with the saas_endpoint and api_key values The init params could be passed in different ways, incase multiple options are used to pass the init params the order of preference would be 1. sdk_config_tuple, 2. sdk_config 3. sdk_config_file

Get the config by signing in to Data Explorer UI and navigating to Utilities → Get CLI/SDK config :param sdk_config_tuple: A tuple consisting of saas_endpoint and api_key in that order :type sdk_config_tuple: tuple :param sdk_config_dict: dictionary containing “saas_endpoint” and “api_key” :type sdk_config_dict: dict :param sdk_config_file: Path to the the SDK config file downloaded from Dataexplorer :type sdk_config_file: str

Raises:

InvalidAuthConfigError – if api-key/host is invalid:
ServerNotReachableError – if the server is unreachable: