akride package
Subpackages
- akride.core package
- Subpackages
- akride.core.conf package
- akride.core.entities package
- Submodules
- akride.core.entities.bgc_job module
- akride.core.entities.catalogs module
- akride.core.entities.containers module
- akride.core.entities.datasets module
- akride.core.entities.docker_image module
- akride.core.entities.docker_pipeline module
- akride.core.entities.docker_repository module
- akride.core.entities.entity module
- akride.core.entities.jobs module
- akride.core.entities.pipeline module
- akride.core.entities.resultsets module
- akride.core.entities.sms_secrets module
- Module contents
- akride.core.models package
- Submodules
- akride.core.constants module
Constants
Constants.AKRIDE_TMP_DIR
Constants.DATASET_FILES_COLUMNS
Constants.DEBUGGING_ENABLED
Constants.DEFAULT_IMAGE_BLOB_EXPR
Constants.DEFAULT_SAAS_ENDPOINT
Constants.DEFAULT_VIDEO_BLOB_EXPR
Constants.IMPORT_CATALOG_STATUS_CHECK_ATTEMPTS
Constants.IMPORT_CATALOG_STATUS_CHECK_INTERVAL_S
Constants.INGEST_IMAGE_PARTITION_SIZE
Constants.INGEST_IMAGE_WF_TOKEN_SIZE
Constants.INGEST_VIDEO_PARTITION_SIZE
Constants.INGEST_VIDEO_WF_TOKEN_SIZE
Constants.LOG_CONFIG_FILE_NAME
Constants.PARTITIONED_TABLE_COLUMNS
Constants.PARTITION_TIME_FRAME
Constants.PROCESS_IMAGE_WF_TOKEN_SIZE
Constants.PROCESS_VIDEO_WF_TOKEN_SIZE
Constants.THUMBNAIL_AGGREGATOR_SDK_DETAILS
Constants.VIDEO_CHUNK_SIZE
- akride.core.enums module
- akride.core.exceptions module
- akride.core.types module
AnalyzeJobParams
CatalogDetails
CatalogDetails.ground_truth_class_column
CatalogDetails.ground_truth_coordinates_class_column
CatalogDetails.ground_truth_coordinates_column
CatalogDetails.prediction_class_column
CatalogDetails.prediction_coordinates_class_score_column
CatalogDetails.prediction_coordinates_column
CatalogDetails.score_column
CatalogTable
ClientManager
ClusterRetrievalSpec
Column
ConfusionMatrix
ConfusionMatrixCellSpec
CoresetSamplingSpec
JobOpSpec
JobStatistics
JoinCondition
PlotFeaturizer
SampleInfoList
SimilaritySearchSpec
- Module contents
- Subpackages
Submodules
akride.background_task_manager module
- class akride.background_task_manager.BackgroundTaskManager[source]
Bases:
object
Helper class to manage background task
- is_task_running(entity_id: str, task_type: BackgroundTaskType) bool [source]
:param : :type : param entity_id: Entity ID associated with the task. :param : :type : param task_type: The type of the background task.
- Returns:
a boolean representing whether task is running or not.
- Return type:
Boolean
- start_task(entity_id: str, task_type: BackgroundTaskType, target_function, *args, **kwargs) BackgroundTask [source]
Start a background task.
:param : :type : param task_type: The type of the background task. :param : :type : param entity_id: Entity ID associated with the task :param : :type : param target_function: The target function to run :param : :type : param args: Arguments for the target function :param : :type : param kwargs: Keyword arguments for the target function
- Returns:
background task object
- Return type:
BackgroundTask
akride.client module
Copyright (C) 2025, Akridata, Inc - All Rights Reserved. Unauthorized copying of this file, via any medium is strictly prohibited
- class akride.client.AkriDEClient(saas_endpoint: str | None = None, api_key: str | None = None, sdk_config_tuple: Tuple[str, str] | None = None, sdk_config_dict: dict | None = None, sdk_config_file: str | None = None)[source]
Bases:
object
Client class to connect to DataExplorer
- abort_bgc_jobs(dataset: Dataset, job: BGCJob | None = None)[source]
Aborts background cataloging jobs for the dataset
- add_to_catalog(dataset: Dataset, table_name: str, csv_file_path: str, import_identifier: str | None = None) bool [source]
Adds new items to an existing catalog.
- Parameters:
- Returns:
Indicates whether the operation was successful.
- Return type:
- attach_pipeline_to_dataset(pipeline_id, dataset_id, attachment_policy_type: AttachmentPolicyType | None = 'ON_DEMAND')[source]
Attach pipeline based on a
- attach_pipelines(dataset: Dataset, featurizer_types: Set[FeaturizerType], attachment_policy_type: AttachmentPolicyType | None = 'PUSH_MODE')[source]
Attach pipelines based on the featurizer types
- Parameters:
dataset (Dataset) – The dataset object to submit ingestion.
featurizer_types (Set[FeaturizerType]) – Featurizers to run for the dataset
attachment_policy_type (Optional[AttachmentPolicyType]) – Pipeline attachment policy type
- Return type:
None
- check_if_dataset_files_to_be_registered(dataset: Dataset, file_paths: List[str]) bool [source]
Check if the files are not registered for the dataset
- create_dataset(spec: Dict[str, Any]) Entity [source]
Creates a new dataset entity.
- Parameters:
spec (Dict[str, Any]) –
- The dataset spec. The spec should have the following fields:
- dataset_namestr
The name of the new dataset.
- dataset_namespacestr, optional
The namespace for the dataset, by default ‘default’.
- data_typeDataType, optional
The type of data to store in the dataset, by default DataType.IMAGE.
- glob_patternstr, optional
The glob pattern for the dataset, by default For image datasets: value =’*(png|jpg|gif|jpeg|tiff|tif|bmp)’. For video datasets: value = ‘*(mov|mp4|avi|wmv|mpg|mpeg|mkv)’
- sample_frame_rate: float, optional
The frame rate per second (fps) for videos. Applicable only for video datasets.
- overwritebool, optional
Overwrite if a dataset with the same name exists.
- Returns:
The created entity
- Return type:
- create_docker_pipeline(spec: DockerPipelineSpec) DockerPipeline | None [source]
Creates a Pipeline using the Docker Image
- specDockerPipelineSpec
Pipeline Specification
- DockerPipeline
object representing the Docker Pipeline
- create_featurizer_image_spec(image_name: str, description: str, command: str, repository_name: str, properties: Dict[str, Any], gpu_filter: bool | None = None, gpu_mem_fraction: float | None = None, allow_no_gpu: bool | None = None, namespace: str | None = 'default', image_tag: str | None = 'latest', name: str | None = None) DockerImageSpec [source]
Creates a DockerImageSpec object that specifies the Featurizer Docker Image to be created
Parameters:
- image_namestr
The name of the Docker Image present in the repository
- descriptionstr
A short description of the Docker Image
- command: str
Command that is used to run the featurizer docker
- repository_name: str
Name of the repository in DE, the Docker Image will be pulled from.
- properties: Dict[str, Any]
Properties specific to the Docker Image
- gpu_filter: Optional[bool]
Flag to specify if the Image can be on a GPU or not
- gpu_mem_fraction: Optional[float]
The GPU specifying the memory to be reserved for the Docker Image. Should be > 0 and <= 1
- allow_no_gpu: Optional[bool]
Flag to specify if the Image can also be run if no GPU is available
- namespace: Optional[str]
Namespace of the Docker Image, By default it will be ‘default’
- image_tag: Optional[str]
Tag of the docker Image in the docker repository, be default it will be “latest”
- name: Optional[str]
Display name of the Docker Image on DE, by default it will be same as image_name
- returns:
Object representing a Docker Image Specification
- rtype:
DockerImageSpec
- create_featurizer_pipeline_spec(pipeline_name: str, pipeline_description: str, featurizer_name: str, data_type: str | None = DataType.IMAGE, namespace: str | None = 'default') DockerPipelineSpec [source]
Creates a DockerImageSpec object that specifies the Featurizer Docker Image to be created
Parameters:
- pipeline_namestr
The name of the Docker pipeline
- pipeline_descriptionstr
A short description of the Docker Pipeline
- featurizer_name: str
Docker Image name of the featurizer to uniquely identify the image.
- data_type: Optional[str]
Data Type of the pipeline, by default DataType.IMAGE. Allowed values are DataType.IMAGE, DataType.VIDEO
- namespace: Optional[str]
Namespace of the Docker Pipeline, By default it will be ‘default’
- returns:
Object representing a Docker Pipeline Specification
- rtype:
DockerPipelineSpec
- create_job(spec: JobSpec) Job [source]
Creates an explore job for the specified dataset.
Parameters:
- dataset: Dataset
The dataset to explore.
- spec: JobSpec
The job specification.
Returns:
- Job
The newly created Job object.
- create_job_spec(dataset: Dataset, job_type: str | JobType = 'EXPLORE', job_name: str = '', predictions_file: str = '', cluster_algo: str | ClusterAlgoType = ClusterAlgoType.HDBSCAN, embed_algo: str | EmbedAlgoType = EmbedAlgoType.UMAP, num_clusters: int | None = None, max_images: int = 1000, catalog_table: CatalogTable | None = None, analyze_params: AnalyzeJobParams | None = None, pipeline: Pipeline | None = None, filters: List[Condition] | None = None, reference_job: Job | None = None) JobSpec [source]
Creates a JobSpec object that specifies how a job is to be created.
Parameters:
- dataset: Dataset
The dataset to explore.
- job_typeJobType, optional
The job type
- job_namestr, optional
The name of the job to create. A unique name will be generated if this is not given.
- predictions_file: str, optional
The path to the catalog file containing predictions and ground truth. This file must be formatted according to the specification at:
https://docs.akridata.ai/docs/analyze-job-creation-and-visualization
- cluster_algoClusterAlgoType, optional
The clustering algorithm to use.
- embed_algoEmbedAlgoType, optional
The embedding algorithm to use.
- num_clustersint, optional
The number of clusters to create.
- max_imagesint, optional
The maximum number of images to use.
- catalog_table: CatalogTable, optional
The catalog to be used for creating this explore job. This defaults to the internal primary catalog that is created automatically when a dataset is created. default: “primary”
- analyze_params: AnalyzeJobParams, optional
Analyze job related configuration parameters
- filtersList[Condition], optional
The filters to be used to select a subset of samples for this job. These filters are applied to the catalog specified by catalog_name.
- reference_job: Job, optional
The reference job for this compare job
- create_table(dataset: Dataset, table_name: str, schema: Dict[str, str], indices: List[str] | None = None) str [source]
Adds and empty external catalog to the dataset.
- Parameters:
- Returns:
Returns the absolute table name for the external catalog.
- Return type:
- create_view(view_name: str, description: str | None, dataset: Dataset, left_table: CatalogTable, right_table: CatalogTable, join_condition: JoinCondition, inner_join: bool = False) str [source]
Create a SQL view for visualization Note: Left join is used by default while creating the view
- Parameters:
view_name (str) – Name of the view to create
description (Optional[str]) – Description text
dataset (Dataset) – Dataset object
left_table (TableInfo) – Left Table of the create view query
right_table (TableInfo) – Right Table of the create view query
join_condition (JoinCondition) – JoinCondition which includes the
table (column from the left and the right) –
inner_join (bool) – Use inner join for joining the tables
- Returns:
view id
- Return type:
- get_all_columns(dataset: Dataset, table: CatalogTable) List[Column] [source]
Returns all columns for a table/view
- get_attached_pipelines(dataset: Dataset, version: str | None = None) List[Pipeline] [source]
Get pipelines attached for dataset given a dataset version
- get_bgc_attached_pipeline_progress_report(dataset: Dataset, pipeline: Pipeline) BGCAttachmentJobStatus [source]
Get Background Catalog progress for the dataset attachment
- Parameters:
- Returns:
Background Catalog status for the dataset attachment
- Return type:
- get_catalog_by_name(dataset: Dataset, name: str) Entity | None [source]
Retrieves a catalog with the given name.
- get_catalog_data_count(dataset: Dataset, table_name: str, filter_str: str | None = None) int [source]
Retrieves the count of the number of rows in a catalog table based on filters
- get_catalog_tags(samples: SampleInfoList) DataFrame [source]
Retrieves the catalog tags corresponding to the given samples.
- Parameters:
samples (SampleInfoList) – The samples to retrieve catalog tags for.
- Returns:
A dataframe of catalog tags.
- Return type:
pd.DataFrame
- get_catalogs(attributes: Dict[str, Any] = {}) List[Entity] [source]
Retrieves information about catalogs that have the given attributes.
- Parameters:
attributes (Dict[str, Any]) –
The filter specification. It may have the following optional fields:
- namestr
filter by catalog name
- statusstr
filter by catalog status, can be one of “active”,”inactive”, “refreshing”, “offline”, “invalid-config”
- Returns:
A list of Entity objects representing catalogs.
- Return type:
List[Entity]
- get_compatible_reference_jobs(dataset: Dataset, pipeline: Pipeline, catalog_table: CatalogTable, search_key: str | None = None) List[Job] [source]
Retrieves jobs created from a given catalog_table which can be used to create “JobType.COMPARE” job types
- Parameters:
- Returns:
A list of Entity objects representing jobs.
- Return type:
List[Entity]
- get_containers(attributes: Dict[str, Any] | None = None) List[Entity] [source]
Retrieves information about containers that have the given attributes.
- get_datasets(attributes: Dict[str, Any] = {}) List[Entity] [source]
Retrieves information about datasets that have the given attributes.
- get_files_to_be_processed(dataset: Dataset, pipeline: Pipeline, batch_size: int) DatasetUnprocessedFiles [source]
Get files to be processed for the dataset
- Parameters:
- Returns:
Dataset files to be processed.
- Return type:
- get_fullres_image_urls(samples: SampleInfoList) Dict [source]
Retrieves the full-resolution image urls for the give samples.
- Parameters:
samples (SampleInfoList) – The samples to retrieve full res image urls for.
- Returns:
A dictionary containing the full-resolution image URLs for each sample.
- Return type:
Dict
- get_fullres_images(samples: SampleInfoList) List[Image] [source]
Retrieves the full-resolution images for the provided job.
- Parameters:
samples (SampleInfoList) – The samples to retrieve images for.
- Returns:
A list of images.
- Return type:
List[Image.Image]
- get_job_samples(job: Job, job_context: JobContext, spec: SimilaritySearchSpec | ConfusionMatrixCellSpec | ClusterRetrievalSpec | CoresetSamplingSpec, **kwargs) SampleInfoList [source]
Retrieves the samples according to the given specification.
- Parameters:
job (Job) – The Job object to get samples for.
job_context (JobContext) – The context in which the samples are requested for.
spec (Union[) – SimilaritySearchSpec, ConfusionMatrixCellSpec, ClusterRetrievalSpec, CoresetSamplingSpec
] – The job context spec.
**kwargs (Additional keyword arguments) –
arguments (Supported keyword) –
- iou_config_threshold: float, optional
Threshold value for iou config
- confidence_score_threshold: float, optional
Threshold value for confidence score
- Returns:
A SampleInfoList object.
- Return type:
- get_job_samples_from_file_path(job: Job, file_info: List[str]) Dict [source]
Retrieves the samples according to the given specification.
- get_job_statistics(job: Job, context: JobStatisticsContext, **kwargs) JobStatistics [source]
Retrieves statistics info from an analyze job.
- Parameters:
job (Job) – The Job object to get statistics for.
context (JobStatisticsContext) – The type of statistics to retrieve.
**kwargs (Additional keyword arguments) –
arguments (Supported keyword) –
- iou_config_threshold: float, optional
Threshold value for iou config
- confidence_score_threshold: float, optional
Threshold value for confidence score
- Returns:
A job statistics object.
- Return type:
- get_jobs(attributes: Dict[str, Any] = {}) List[Entity] [source]
Retrieves information about jobs that have the given attributes.
- Parameters:
attributes (Dict[str, Any]) –
The filter specification. It may have the following optional fields:
- data_typestr
The data type to filter on. This can be ‘IMAGE’ or ‘VIDEO’.
- job_typestr
The job type to filter on - ‘EXPLORE’, ‘ANALYZE’ etc.
- search_keystr
Filter jobs across fields like job name, dataset id, and dataset name.
- Returns:
A list of Entity objects representing jobs.
- Return type:
List[Entity]
- get_progress_info(task: BackgroundTask) ProgressInfo [source]
Gets the progress of the specified task.
- Parameters:
task (BackgroundTask) – The task object to retrieve the progress information for.
- Returns:
The progress information
- Return type:
- get_repository_by_name(name: str) Entity | None [source]
Retrieves a Docker repository with the given name.
- get_resultset_by_id(resultset_id: str) Entity [source]
Retrieves a resultset with the given identifier.
- get_resultset_samples(resultset: Resultset, max_sample_size: int = 10000) SampleInfoList [source]
Retrieves the samples of a resultset
- Parameters:
resultset (Resultset) – The Resultset object to get samples for.
- Returns:
A SampleInfoList object.
- Return type:
- get_resultsets(attributes: Dict[str, Any] = {}) List[Entity] [source]
Retrieves information about resultsets that have the given attributes.
- get_secrets(name: str, namespace: str) SMSSecrets | None [source]
Retrieves information about SMS Secret for the given SMS secret name and namespace.
- Parameters:
- Returns:
Object representing Secrets.
- Return type:
- get_server_version() str [source]
Get Dataexplorer server version
- Returns:
server version
- Return type:
- get_thumbnail_images(samples: SampleInfoList) List[Image] [source]
Retrieves the thumbnail images corresponding to the samples.
- Parameters:
samples (SampleInfoList) – The samples to retrieve thumbnails for.
- Returns:
A list of thumbnail images.
- Return type:
List[Image.Image]
- get_view_id(dataset: Dataset, view_name: str) CatalogViewInfo | None [source]
Retrieves the view id for a view of a dataset
- Parameters:
- Returns:
Returns the CatalogViewInfo object
- Return type:
Optional[CatalogViewInfo]
- import_catalog(dataset: Dataset, table_name: str, csv_file_path: str, create_view: bool = True, file_name_column: str | None = None, pipeline_name: str | None = None, import_identifier: str | None = None) bool [source]
Method for importing an external catalog into a dataset.
- Parameters:
dataset (Dataset) – The dataset to import the catalog into.
table_name (str) – The name of the table to create for the catalog.
csv_file_path (str) – The path to the CSV file containing the catalog data.
create_view (bool default: True) – Create a view with imported catalog and primary catalog table
file_name_column (str) – Name of the column in the csv file that contains the absolute filename
pipeline_name (str) – Name of pipeline whose primary table will be joined with the imported table. Ignored if create_view is false
import_identifier (str) – Unique identifier for importing data
- Returns:
Indicates whether the operation was successful.
- Return type:
- ingest_dataset(dataset: Dataset, data_directory: str, use_patch_featurizer: bool = True, with_clip_featurizer: bool = False, async_req: bool = False, catalog_details: CatalogDetails | None = None) BackgroundTask | None [source]
Starts an asynchronous ingest task for the specified dataset.
- Parameters:
dataset (Dataset) – The dataset to ingest.
data_directory (str) – The path to the directory containing the dataset files.
use_patch_featurizer (bool, optional) – Ingest dataset to enable patch-based similarity searches.
with_clip_featurizer (bool, optional) – Ingest dataset to enable text prompt based search.
async_req (bool, optional) – Whether to execute the request asynchronously.
catalog_details (Optional[CatalogDetails]) – Parameters details for creating a catalog
- Returns:
A task object
- Return type:
BackgroundTask
- register_docker_image(spec: DockerImageSpec) DockerImage | None [source]
Registers a Docker Image
- specDockerImageSpec
Docker Image Specification
- DockerImage
Object representing the Docker Image
- submit_bgc_job(dataset: Dataset, pipelines: List[Pipeline]) BGCJob [source]
Submits a Background Cataloging Job for the dataset
- update_resultset(resultset: Resultset, add_list: SampleInfoList | None = None, del_list: SampleInfoList | None = None) bool [source]
Updates a resultset.
- Parameters:
resultset (Resultset) – The resultset to be updated.
add_list (SampleInfoList, optional) – The list of samples to be added.
del_list (SampleInfoList, optional) – The list of samples to be deleted.
- Returns:
Indicates whether the operation was successful.
- Return type:
- wait_for_completion(task: BackgroundTask) ProgressInfo [source]
Waits for the specified task to complete.
- Parameters:
task (BackgroundTask) – The ID of the job to wait for.
- Returns:
The progress information
- Return type:
akride.main module
Module contents
- akride.init(sdk_config_tuple: Tuple[str, str] | None = None, sdk_config_dict: dict | None = None, sdk_config_file: str | None = '') AkriDEClient [source]
Initializes the AkriDEClient with the saas_endpoint and api_key values The init params could be passed in different ways, incase multiple options are used to pass the init params the order of preference would be 1. sdk_config_tuple, 2. sdk_config 3. sdk_config_file
Get the config by signing in to Data Explorer UI and navigating to Utilities → Get CLI/SDK config :param sdk_config_tuple: A tuple consisting of saas_endpoint and api_key in that order :type sdk_config_tuple: tuple :param sdk_config_dict: dictionary containing “saas_endpoint” and “api_key” :type sdk_config_dict: dict :param sdk_config_file: Path to the the SDK config file downloaded from Dataexplorer :type sdk_config_file: str
- Raises:
InvalidAuthConfigError – if api-key/host is invalid:
ServerNotReachableError – if the server is unreachable: