akride package
Subpackages
- akride.core package
- Subpackages
- Submodules
- akride.core.constants module
Constants
Constants.BLOB_TABLE_COLUMNS
Constants.DATASET_FILES_COLUMNS
Constants.DEBUGGING_ENABLED
Constants.DEFAULT_SAAS_ENDPOINT
Constants.FILE_TYPES
Constants.INGEST_FILES_COUNT_IN_ONE_PARTITION
Constants.INGEST_WF_TOKEN_SIZE
Constants.LOG_CONFIG_FILE_NAME
Constants.PARTITIONED_TABLE_COLUMNS
Constants.PARTITION_SIZE
Constants.PRIMARY_TABLE_COLUMNS
Constants.PROCESS_WF_TOKEN_SIZE
Constants.SUMMARY_TABLE_COLUMNS
Constants.THUMBNAIL_AGGREGATOR_SDK_DETAILS
- akride.core.enums module
- akride.core.exceptions module
- akride.core.types module
AnalyzeJobParams
CatalogDetails
CatalogDetails.ground_truth_class_column
CatalogDetails.ground_truth_coordinates_class_column
CatalogDetails.ground_truth_coordinates_column
CatalogDetails.prediction_class_column
CatalogDetails.prediction_coordinates_class_score_column
CatalogDetails.prediction_coordinates_column
CatalogDetails.score_column
CatalogTable
ClientManager
ClusterRetrievalSpec
Column
ConfusionMatrix
ConfusionMatrixCellSpec
CoresetSamplingSpec
JobOpSpec
JobStatistics
JoinCondition
PlotFeaturizer
SampleInfoList
SimilaritySearchSpec
- Module contents
Submodules
akride.background_task_manager module
- class akride.background_task_manager.BackgroundTaskManager[source]
Bases:
object
Helper class to manage background task
- is_task_running(entity_id: str, task_type: BackgroundTaskType) bool [source]
:param : :type : param entity_id: Entity ID associated with the task. :param : :type : param task_type: The type of the background task.
- Returns:
a boolean representing whether task is running or not.
- Return type:
Boolean
- start_task(entity_id: str, task_type: BackgroundTaskType, target_function, *args, **kwargs) BackgroundTask [source]
Start a background task.
:param : :type : param task_type: The type of the background task. :param : :type : param entity_id: Entity ID associated with the task :param : :type : param target_function: The target function to run :param : :type : param args: Arguments for the target function :param : :type : param kwargs: Keyword arguments for the target function
- Returns:
background task object
- Return type:
BackgroundTask
akride.client module
Copyright (C) 2024, Akridata, Inc - All Rights Reserved. Unauthorized copying of this file, via any medium is strictly prohibited
- class akride.client.AkriDEClient(saas_endpoint: str | None = None, api_key: str | None = None, sdk_config_tuple: Tuple[str, str] | None = None, sdk_config_dict: dict | None = None, sdk_config_file: str | None = None)[source]
Bases:
object
Client class to connect to DataExplorer
- add_to_catalog(dataset: Dataset, table_name: str, csv_file_path: str) bool [source]
Adds new items to an existing catalog.
- create_dataset(spec: Dict[str, Any]) Entity [source]
Creates a new dataset entity.
- Parameters:
spec (Dict[str, Any]) –
- The dataset spec. The spec should have the following fields:
- dataset_namestr
The name of the new dataset.
- dataset_namespacestr, optional
The namespace for the dataset, by default ‘default’.
- data_typeDataType, optional
The type of data to store in the dataset, by default DataType.IMAGE.
- glob_patternstr, optional
The glob pattern for the dataset, by default ‘*(png|jpg|gif|jpeg|tiff|tif|bmp)’.
- overwritebool, optional
Overwrite if a dataset with the same name exists.
- Returns:
The created entity
- Return type:
- create_job(spec: JobSpec) Job [source]
Creates an explore job for the specified dataset.
Parameters:
- dataset: Dataset
The dataset to explore.
- spec: JobSpec
The job specification.
Returns:
- Job
The newly created Job object.
- create_job_spec(dataset: Dataset, job_type: str | JobType = 'EXPLORE', job_name: str = '', predictions_file: str = '', cluster_algo: str | ClusterAlgoType = 'hdbscan', embed_algo: str | EmbedAlgoType = 'umap', num_clusters: int | None = None, max_images: int = 1000, catalog_table: CatalogTable | None = None, analyze_params: AnalyzeJobParams | None = None, pipeline: Pipeline | None = None, filters: List[Condition] | None = None) JobSpec [source]
Creates a JobSpec object that specifies how a job is to be created.
Parameters:
- dataset: Dataset
The dataset to explore.
- job_typeJobType, optional
The job type
- job_namestr, optional
The name of the job to create. A unique name will be generated if this is not given.
- predictions_file: str, optional
The path to the catalog file containing predictions and ground truth. This file must be formatted according to the specification at:
https://docs.akridata.ai/docs/analyze-job-creation-and-visualization
- cluster_algoClusterAlgoType, optional
The clustering algorithm to use.
- embed_algoEmbedAlgoType, optional
The embedding algorithm to use.
- num_clustersint, optional
The number of clusters to create.
- max_imagesint, optional
The maximum number of images to use.
- catalog_table: CatalogTable, optional
The catalog to be used for creating this explore job. This defaults to the internal primary catalog that is created automatically when a dataset is created. default: “primary”
- analyze_params: AnalyzeJobParams, optional
Analyze job related configuration parameters
- filtersList[Condition], optional
The filters to be used to select a subset of samples for this job. These filters are applied to the catalog specified by catalog_name.
- create_view(view_name: str, description: str | None, dataset: Dataset, left_table: CatalogTable, right_table: CatalogTable, join_condition: JoinCondition) str [source]
Create a SQL view for visualization
- Parameters:
view_name (str) – Name of the view to create
description (Optional[str]) – Description text
dataset (Dataset) – Dataset object
left_table (TableInfo) – Left Table of the create view query
right_table (TableInfo) – Right Table of the create view query
join_condition (JoinCondition) – JoinCondition which includes the
table (column from the left and the right) –
- Returns:
view id
- Return type:
- get_all_columns(dataset: Dataset, table: CatalogTable) List[Column] [source]
Returns all columns for a table/view
- get_attached_pipelines(dataset: Dataset, version: str | None = None) List[Pipeline] [source]
Get pipelines attached for dataset given a dataset version
- get_catalog_by_name(dataset: Dataset, name: str) Entity | None [source]
Retrieves a catalog with the given name.
- get_catalog_tags(samples: SampleInfoList) DataFrame [source]
Retrieves the catalog tags corresponding to the given samples.
- Parameters:
samples (SampleInfoList) – The samples to retrieve catalog tags for.
- Returns:
A dataframe of catalog tags.
- Return type:
pd.DataFrame
- get_catalogs(attributes: Dict[str, Any] = {}) List[Entity] [source]
Retrieves information about catalogs that have the given attributes.
- Parameters:
attributes (Dict[str, Any]) –
The filter specification. It may have the following optional fields:
- namestr
filter by catalog name
- statusstr
filter by catalog status, can be one of “active”,”inactive”, “refreshing”, “offline”, “invalid-config”
- Returns:
A list of Entity objects representing catalogs.
- Return type:
List[Entity]
- get_datasets(attributes: Dict[str, Any] = {}) List[Entity] [source]
Retrieves information about datasets that have the given attributes.
- get_fullres_image_urls(samples: SampleInfoList) Dict [source]
Retrieves the full-resolution image urls for the give samples.
- Parameters:
samples (SampleInfoList) – The samples to retrieve full res image urls for.
- Returns:
A dictionary containing the full-resolution image URLs for each sample.
- Return type:
Dict
- get_fullres_images(samples: SampleInfoList) List[Image] [source]
Retrieves the full-resolution images for the provided job.
- Parameters:
samples (SampleInfoList) – The samples to retrieve images for.
- Returns:
A list of images.
- Return type:
List[Image.Image]
- get_job_samples(job: Job, job_context: JobContext, spec: SimilaritySearchSpec | ConfusionMatrixCellSpec | ClusterRetrievalSpec | CoresetSamplingSpec, **kwargs) SampleInfoList [source]
Retrieves the samples according to the given specification.
- Parameters:
job (Job) – The Job object to get samples for.
job_context (JobContext) – The context in which the samples are requested for.
spec (Union[) – SimilaritySearchSpec, ConfusionMatrixCellSpec, ClusterRetrievalSpec, CoresetSamplingSpec
] – The job context spec.
**kwargs (Additional keyword arguments) –
arguments (Supported keyword) –
- iou_config_threshold: float, optional
Threshold value for iou config
- confidence_score_threshold: float, optional
Threshold value for confidence score
- Returns:
A SampleInfoList object.
- Return type:
- get_job_samples_from_file_path(job: Job, file_info: List[str]) Dict [source]
Retrieves the samples according to the given specification.
- get_job_statistics(job: Job, context: JobStatisticsContext, **kwargs) JobStatistics [source]
Retrieves statistics info from an analyze job.
- Parameters:
job (Job) – The Job object to get statistics for.
context (JobStatisticsContext) – The type of statistics to retrieve.
**kwargs (Additional keyword arguments) –
arguments (Supported keyword) –
- iou_config_threshold: float, optional
Threshold value for iou config
- confidence_score_threshold: float, optional
Threshold value for confidence score
- Returns:
A job statistics object.
- Return type:
- get_jobs(attributes: Dict[str, Any] = {}) List[Entity] [source]
Retrieves information about jobs that have the given attributes.
- Parameters:
attributes (Dict[str, Any]) –
The filter specification. It may have the following optional fields:
- data_typestr
The data type to filter on. This can be ‘IMAGE’ or ‘VIDEO’.
- job_typestr
The job type to filter on - ‘EXPLORE’, ‘ANALYZE’ etc.
- search_keystr
Filter jobs across fields like job name, dataset id, and dataset name.
- Returns:
A list of Entity objects representing jobs.
- Return type:
List[Entity]
- get_progress_info(task: BackgroundTask) ProgressInfo [source]
Gets the progress of the specified task.
- Parameters:
task (BackgroundTask) – The task object to retrieve the progress information for.
- Returns:
The progress information
- Return type:
- get_resultset_samples(resultset: Resultset) SampleInfoList [source]
Retrieves the samples of a resultset
- Parameters:
resultset (Resultset) – The Resultset object to get samples for.
- Returns:
A SampleInfoList object.
- Return type:
- get_resultsets(attributes: Dict[str, Any] = {}) List[Entity] [source]
Retrieves information about resultsets that have the given attributes.
- get_server_version() str [source]
Get Dataexplorer server version
- Returns:
server version
- Return type:
- get_thumbnail_images(samples: SampleInfoList) List[Image] [source]
Retrieves the thumbnail images corresponding to the samples.
- Parameters:
samples (SampleInfoList) – The samples to retrieve thumbnails for.
- Returns:
A list of thumbnail images.
- Return type:
List[Image.Image]
- import_catalog(dataset: Dataset, table_name: str, csv_file_path: str, create_view: bool = True, file_name_column: str | None = None, pipeline_name: str | None = None) bool [source]
Method for importing an external catalog into a dataset.
- Parameters:
dataset (Dataset) – The dataset to import the catalog into.
table_name (str) – The name of the table to create for the catalog.
csv_file_path (str) – The path to the CSV file containing the catalog data.
create_view (bool default: True) – Create a view with imported catalog and primary catalog table
file_name_column (str) – Name of the column in the csv file that contains the absolute filename
pipeline_name (str) – Name of pipeline whose primary table will be joined with the imported table. Ignored if create_view is false
- Returns:
Indicates whether the operation was successful.
- Return type:
- ingest_dataset(dataset: Dataset, data_directory: str, use_patch_featurizer: bool = True, with_clip_featurizer: bool = False, async_req: bool = False, catalog_details: CatalogDetails | None = None) BackgroundTask | None [source]
Starts an asynchronous ingest task for the specified dataset.
- Parameters:
dataset (Dataset) – The dataset to ingest.
data_directory (str) – The path to the directory containing the dataset files.
use_patch_featurizer (bool, optional) – Ingest dataset to enable patch-based similarity searches.
with_clip_featurizer (bool, optional) – Ingest dataset to enable text prompt based search.
async_req (bool, optional) – Whether to execute the request asynchronously.
catalog_details (Optional[CatalogDetails]) – Parameters details for creating a catalog
- Returns:
A task object
- Return type:
BackgroundTask
- update_resultset(resultset: Resultset, add_list: SampleInfoList | None = None, del_list: SampleInfoList | None = None) bool [source]
Updates a resultset.
- Parameters:
resultset (Resultset) – The resultset to be updated.
add_list (SampleInfoList, optional) – The list of samples to be added.
del_list (SampleInfoList, optional) – The list of samples to be deleted.
- Returns:
Indicates whether the operation was successful.
- Return type:
- wait_for_completion(task: BackgroundTask) ProgressInfo [source]
Waits for the specified task to complete.
- Parameters:
task (BackgroundTask) – The ID of the job to wait for.
- Returns:
The progress information
- Return type:
akride.main module
Module contents
- akride.init(sdk_config_tuple: Tuple[str, str] | None = None, sdk_config_dict: dict | None = None, sdk_config_file: str | None = '') AkriDEClient [source]
Initializes the AkriDEClient with the saas_endpoint and api_key values The init params could be passed in different ways, incase multiple options are used to pass the init params the order of preference would be 1. sdk_config_tuple, 2. sdk_config 3. sdk_config_file
Get the config by signing in to Data Explorer UI and navigating to Utilities → Get CLI/SDK config :param sdk_config_tuple: A tuple consisting of saas_endpoint and api_key in that order :type sdk_config_tuple: tuple :param sdk_config_dict: dictionary containing “saas_endpoint” and “api_key” :type sdk_config_dict: dict :param sdk_config_file: Path to the the SDK config file downloaded from Dataexplorer :type sdk_config_file: str
- Raises:
InvalidAuthConfigError – if api-key/host is invalid:
ServerNotReachableError – if the server is unreachable: