Intel® Edge Data Collection Detailed Usage#
Contents#
Compatible UDFs #
Features like Annotation and Remote Storage provided by Intel® Edge Data Collection requires certain fields to be present in the frame metadata to operate. These fields carry annotation information, track data export status and other important details. Usually UDFs are the ones that add these fields along with any other fields the UDFs already adds as part of its functionality
By default certain sample UDFs are provided that adds these metadata fields making them Intel® Edge Data Collection compatible. In default workflow, EVAM is the direct source to publish these information to DataStore, hence any new UDFs must be added to its configuration so that the metadata have the relevant fields allowing it to be Intel® Edge Data Collection compatible.
Ensure that a compatible UDF is publishing metadata for Intel® Edge Data Collection to work.
In case you want to add your custom UDF and make it compatible with Intel® Edge Data Collection, ensure that the following fields are present in the metadata published by the UDF. Any missing key here will result in an error. Keys with sample values:
"last_modified": time_ns()
"export_code": 0
"annotation_type": "auto"
"annotations": {"objects":[]}
Details on the above fields:
last_modified
: Mandatory. Refers to the last time the metadata was modified. The value should be in nanoseconds.export_code
: Mandatory. Refers to the status of the data export. The following export codes are supported:0
: Data has not been exported1
: Data has been exported
annotation_type
: Mandatory. Refers to the type of annotation. The following annotation types are supported:auto
: for auto annotationhuman
: for human annotation
annotations
: Mandatory. Refers to the annotations for objects detection in the image.objects
: A list of objects detected in the image. If present, each object should have the following keys:label
: The label of the objectscore
: The confidence score of the objectbbox
: The bounding box of the object. It is list of bbox coordinates in top-left and bottom-right format. For example,[x1, y1, x2, y2]
It is important that the objects within the annotations dictionary have bounding box dimensions in top-left and bottom-right format. Some models may have bounding box predicted as top-left coordinates, width and height. In that case, the UDF must convert it to the expected top-left and bottom-right format to avoid incorrect bounding box dimensions during data export or visualization.
Human Annotation Mode #
Edit the
/IEdgeInsights/build/.env
file and add the entries that are related to CVAT as shown below#CVAT related config CVAT_SUPERUSER_NAME= # ex: CVAT_SUPERUSER_NAME=cvat_user CVAT_SUPERUSER_PASSWORD= # don't provide special characters yet and ensure it is atleast 9 characters long. ex: CVAT_SUPERUSER_PASSWORD=somerandom123 # For CVAT_HOST, provide the host machine IP if the system is connected to an external network - In this case, CVAT can be accessed from any machine on the same network. If the system is not connected to an external network then provide the docker system host IP - In this case, CVAT can only be accessed from the same machine where it is running. The docker system host IP can be obtained by running the following command: # docker network inspect bridge -f '{{range .IPAM.Config}}{{.Gateway}}{{end}}' # Generally, the IP is found to be 172.17.0.1, but can vary and hence it is better to always obtain it from the above command CVAT_HOST=172.17.0.1 CVAT_PORT=8443 CVAT_HTTPS=true
Note: To know more on the rules that must be followed in order for
CVAT_SUPERUSER_PASSWORD
to be validated successfully by django, and for more details, click hereOptions to configure “human_annotation” in config.json
categories
Specify the categories which need to be supported in CVAT for human annotation.
Example:
"categories":["vehicle", "Person"]
enabled
Ensure that you have set
"enabled": true
for “human_annotation”Also ensure that you have set
"enabled": false
for “auto_annotation” since human annotation and auto annotation should not be enabled at the same time.
start_date
andend_date
Update the above policies in YYYY-MM-DDTHH:MM:SSZ format.
Example:
"start_date": "2022-09-02T00:00:00Z"
and"end_date": "2022-09-10T00:00:00Z"
if you would like to see CVAT tasks for images in image store ranging from 2022-09-02 (inclusive) to 2022-09-10 (inclusive).Usage in live task creation: If the date of live feed happens to fall within the
start_date
andend_date
range, then live tasks would be created in CVAT for images from live feed. If you would like to create live tasks forever, set"end_date": "9999-12-30T00:00:00Z"
. Or, if you would like to create live tasks until a particular futuristic date, you can setend_date
accordingly. If you would like to create live tasks from a particular futuristic datestart_date
can be set accordingly as well.Note: live task creation takes into account only the date and not the time part of it i.e., THH:MM:SSZ
Note: Ensure that both
start_date
andend_date
are within the retention policy of DataStore.
last_x_seconds_of_images
Specify a value in seconds in order to retrieve the last “x” seconds worth of images for creating human annotation tasks in CVAT
Example: A value of 60 would create human annotation task(s) in CVAT for the images that were stored in Data Store in the last 60 seconds.
Note:
Provide a value of 0 if you do not want this configuration setting to override the
start_date
andend_date
configuration settings.This configuration setting doesn’t affect live task creation
img_handles
Specify image handles for creating human annotation tasks in CVAT
Example:
"img_handles": ["1b387bad73", "5b387bad73"]
would create CVAT task(s) for the mentioned image handlesNote:
Provide a value of
"img_handles": []
if you do not want this configuration setting to override thestart_date
,end_date
andlast_x_seconds_of_images
configuration settings.This configuration setting works regardless of whether image handles are annotated or not.
image_type
Update this policy to “annotated”, “unannotated” or “all” to create annotation tasks in CVAT for the respective images.
Example 1:
"image_type": "annotated"
would create tasks in CVAT only for images that already have annotation associated with them in influxdb. Such tasks can be termed as “review tasks” that are meant for reviewing already existing annotation (that are present due to auto annotation module or human annotation module)Example 2:
"image_type": "unannotated"
would create tasks in CVAT only for images that do not have any annotation associated with them in influxdb. Such tasks are meant to perform annotation for the first time.Example 3:
"image_type": "all"
would create tasks in CVAT for images regardless of whether they have annotation associated with them in influxdb. These are hybrid tasks that are meant to perform review of existing annotation for certain images and to perform fresh annotation on the rest of the images.
db_topics
The topics/tables in the data store that need human annotation tasks
is_auto_sync
Update this policy to either
true
orfalse
.Example 1:
"is_auto_sync": true
. In this case, auto sync will happen in regular intervals as specified by theauto_sync_interval_in_sec
policy in the below point.Example 2:
"is_auto_sync": false
. In this case, no annotations from CVAT will be synced with influxdb.
auto_sync_interval_in_sec
Update this policy to a value (in seconds) that defines how often you would like to poll the CVAT tasks list. This polling is necessary to update the influxdb with annotations from cvat. Note that the update to influxdb happens only when the task has been updated since the last time auto sync ran.
Example:
"auto_sync_interval_in_sec": 60
polls the CVAT tasks list every 60 seconds and if any particular task happens to be updated since the last time auto sync ran (60 seconds ago), then the task data (annotation) is synced with influxdb.Note: If there is a CVAT task that contains one or more images without annotation, then the respective CVAT job has to be marked as “completed” by clicking on
Menu->Finish the job
(and just clicking on “save” in the job is not enough) in order to update theannotation_type
in influxdb ashuman
for those images with empty annotation. This is necessary to have accurate annotation available for deep learning model training to avoid false positives/negatives.Explanation with the help of an example: Let us say that there are 10 images in a certain CVAT task. 2 of them have annotation and the rest 8 do not. Upon clicking “save” within the CVAT job, the next time auto sync mechanism in Intel® Edge Data Collection runs, the annotations for the 2 images and their annotation type as human will be synced with influxdb. However, for the other 8 images that do not have any annotation, the empty annotation and the annotation type as human will be updated as human in influxdb by auto sync mechanism only upon completion (and not just “save”) of the CVAT job.
max_images_per_task
Update this policy to set a limit on maximum images that a CVAT task can hold
Example:
"max_images_per_task": 100
. This would mean that every task that gets created in CVAT can hold a maximum of 100 images. The actual amount of images that goes into a particular task is decided by the intersection with other policies:start_date
,end_date
andimage_type
Note: Set this policy to
0
if you do not want any tasks to be created.
max_images_per_job
Update this policy to set a limit on maximum images that a CVAT job in a given task can hold
Example:
"max_images_per_job": 1
. This would mean that every job that gets created in CVAT can hold a maximum of 1 image.Note: Setting this policy to a value greater than
1
would mean that, when a given job is saved, all images in the job, during auto sync, gets synced back to DataStore as ‘human’ annotated even though all those images might not have had modified annotation by the user.
max_images_per_live_task
Update this policy to set a limit on maximum images that a live task in CVAT can hold. The images in this case come from live feed. Consider giving this policy a lower number if you face frame drop issues.
Example:
"max_images_per_live_task": 50
Note: Set this policy to
0
if you do not want any live tasks to be created.
login_retry_limit
Update this policy to a reasonable number which decides how many times login to CVAT will be attempted by Intel® Edge Data Collection before giving up.
Example:
"login_retry_limit": 5
Note: After running
run.sh
, open the CVAT web application by navigating tohttps://<host_machine_IP>:8443
in google chrome and login using the superuser credentials you provided in the .env file. Perform annotation as necessary for the created tasks. Annotators within the team could create accounts for themselves using the CVAT web app and perform annotations as well for the tasks that the superuser assigned to them.IMPORTANT
Ensure not to logout of CVAT in browser as it causes issues in interaction with Intel® Edge Data Collection. In case you do logout, after you log back in, restart the Intel® Edge Data Collection container.
Google Chrome is the only browser that is supported by CVAT. For more details, see link
Remote Storage #
Intel® Edge Data Collection provides optional remote storage feature that allows user to export object detection data or classification data in MS-COCO and imagenet data formats respectively to a persistent file storage. The file storage could be a local disk or a network mounted volume as set in the environment
DCAAS_STORAGE_DIR
inIEdgeInsights/build/.env
file. If one has valid azure storage credentials, they also make use of uploading the exact data to azure blob storage service. The data export feature can be run periodically (autosync) or as a one-time(snapshot) activity. Read further below for more.Export behavior
Depending upon the image_recognition type chosen, the exported directory structure can vary. For both, topic wise parent directories are created within which data is exported.
Classification
For classification type of export, inside the topic directory, a class wise folder structure is created containing images tagged to that particular class. This particular method of segregating images is also known as ImageNet data format for classification. Before export to relevant label directory, we identify the type of export whether it is a binary class export or multi-class export. We make use of the labels present in category list in Intel® Edge Data Collection’s
config.json
for this. The directory assignment logic for export is as follows.If only one category is present in category list, the export is considered as a binary class export,
For an img_handle, if the label(from category list) is found to be NEGATIVE class (i.e.<threshold), it is moved into
non_<label>
dirFor an img_handle, if the label(from category list) is found to be POSITIVE class (i.e.>=threshold), it is moved into its
<label>
dir
However, if multiple categories are present in category list, the export is considered as multi-class export:
For an img_handle, if all the labels found in db are all NEGATIVE class(i.e.<threshold), it is moved to
no_class
dirFor an img_handle, any/all labels are found to be POSITIVE class(i.e.>=threshold), it is copied to their respective
<label>
dir
For example, for the following metadata in influx for the table
edge_video_analytics_results
,img_handle
cat
dog
person
A
0
0
0
B
-
-
1
C
0
1
0
D
-
1
1
if category list has [
cat
,dog
,person
],
└── edge_video_analytics_results ├── dog │ ├── C.jpg │ └── D.jpg ├── no_class │ └── A.jpg └── person ├── B.jpg └── D.jpg
if category list has only [
person
],
└── edge_video_analytics_results ├── non_person │ ├── A.jpg │ └── C.jpg └── person ├── B.jpg └── D.jpg
Object Detection
For object detection type of export, data format used is MS-COCO. Similar to classification export, data is grouped inside a topic directory. The data is segregated into day-wise sub-directory depending on their ingestion time into influx (YYYYMMDD format). Inside it, the data in COCO format exported. See sample structure below.
└── edge_video_analytics_results ├── 20241702 │ ├── annotations │ │ └── instances_20241702.json │ └── images │ ├── x.jpg │ └── y.jpg └── 20241703 ├── annotations │ └── instances_20241703.json └── images ├── a.jpg ├── b.jpg └── c.jpg
Note if data is exported with snapshot mode on, the above exported data will be kept in a separate directory with folder name as time of run to distinguish between other snapshot exports, if any.
Configuration
Remote storage provides following options to configure in
config.json
"storage":{ "mode": 0, # 0,1,2 "type":"disk", # or 'azure' "auto_sync_interval_in_sec": 30, "ingestion_fps": 9, "db_topics": ["edge_video_analytics_results"] "img_recognition_configs":[ { "type": "classification", # or 'object_detection' "export_format": "imagenet", "classification_threshold": {"anomalous": 0.5}, "filter": { "annotation_type": ["auto", "human"], "start_date": "2023-06-06T00:00:00Z", "end_date": "2024-12-30T13:00:00Z" } } ] }
mode
set it to
0
if you wish to turn off remote storage feature. That would make other options in the storage config irrelevant. The following modes are supported:0: turn off data export functionality
1: snapshot mode: a one time data export based on the img recognition config options provided.
2: autosync mode: a periodic data export based on the img recognition config options provided.
type
disk
when set to
disk
, the target directory is a disk path (can be host or network mounted). the path is set byDCAAS_LOCAL_STORAGE_DIR
in/IEdgeInsights/build/.env
file. Ensure that the path has necessary write permission for Intel® Edge Data Collection user. Read further on how to do that
azure
when set to
azure
, the target directory is an azure storage container. User must have a valid azure storage subscription before being able to run remote storage in azure mode. Following additional account keys and azure storage container path needs to be added as environment variable to the build .env file besidesDCAAS_LOCAL_STORAGE_DIR
. See Additional Info section on how to retrieve these values from Azure.AZURE_STORAGE_CONNECTION_STRING = # <storage account connection string> AZURE_STORAGE_CONTAINER = # <container or directory in azure cloud where data will be uploaded>
NOTE: Regardless of storage type used -
disk
orazure
, theDCAAS_LOCAL_STORAGE_DIR
must be set. When storage type isdisk
, this is the target directory whereas, when storage type isazure
, a sub-directory calledtmp
is created inside this path and is used as a temporary directory for caching local changes before being uploaded to azure. The contents stored in the temporary directory are not guaranteed to persist unlike indisk
storage type.
auto_sync_interval_in_sec
time interval (in sec) at which auto-sync runs
ingestion_fps
An estimated number of frames being published per second. defaults to 9. This helps remote storage algorithm to adjust/limit the number of rows to fetch from DataStore microservice during auto sync. Higher value will increase the the max number of rows to be fetched upon every query at a possible larger response time and vice-versa. Cannot be 0 or negative
db_topics
List of topics/tables in the data store to be queried by the remote storage. Needs at least 1 topic if storage
mode
is set to 1 or 2
img_recognition_configs
export sub config related to specific image recognition type. For individual image recognition configurations, see README.md.
Permissions for storage dir
If remote storage is enabled in config, set the target path for storage and enable permissions as follows. In the
.env
file, setDCAAS_STORAGE_DIR
to the directory where you wish to remotely save the annotations and images (ex: /path/to/persistent/remote/storage). As a better alternative tosudo chmod 777 /path/to/persistent/remote/storage
(since this allows anyone to read/write the data), please ensure that write permissions are given to the persistent remote storage path for the user mentioned in env variableEII_USER
which defaults toeiiuser
.
Configuring other options #
Configuring “options” in config.json
input_queue
Stores raw frames from video ingestion.
data_filter_queue
Stores frames from data filter.
output_queue
Stores processed frames that are ready to be published.
max_size
forinput_queue
,data_filter_queue
andoutput_queue
.Sets the size of the mentioned queues.
Consider giving it a higher value if you face frame drop issues.
If value is
<=0
, queue size is infinite.
Status Notification #
Intel® Edge Data Collection exports statuses of task creation and data export progress for snapshots in the form of a json file.
For both progress in terms of no of images processed is continuously updated and status is set to COMPLETED
or IN_PROGRESS
depending upon whether all the frames have been processed or not.
The notification file can be found in ${EII_INSTALL_PATH}/data/dcaas/dcaas_notification.json
Here is a sample of notification json file (comments added for explanation).
{
"services": [ # list of enabled services with status
"remote_storage",
"human_annotation"
],
"remote_storage": { # remote storage specific status
"snapshot_list": [
"2024-02-14-193903" # list of snapshot. currently only 1 i.e.latest is mentioned
],
"2024-02-14-193903": { # snapshot instance details
"time_utc": "2024-02-14T19:39:03",
"config": {
"mode": 1,
"device": "disk",
"auto_sync_interval_in_sec": 30,
"ingestion_fps": 9,
"img_recognition_configs": [
{
"type": "classification",
"export_format": "imagenet",
"filter": {
"start_date": "2023-06-06T00:00:00Z",
"end_date": "2024-12-30T13:00:00Z",
"annotation_type": [
"auto",
"human"
]
},
"classification_threshold": {
"anomalous": 0.4
}
}
],
"db_topics": [
"edge_video_analytics_results"
]
},
"status": "IN_PROGRESS", # export status, 'COMPLETED' when done
"export_dir": "/storage/2024-02-14-193903", # exported relative dir
"processed_frames": 200, # frames exported till now
"total_frames": 402
}
},
"human_annotation": { # human annotation specific status
"time_utc": "2024-02-14T19:39:22.443201Z",
"config": {
"cvat": {
"login_retry_limit": 10,
"task_creation_policies": {
"max_images_per_job": 5,
"max_images_per_live_task": 5,
"max_images_per_task": 100
}
},
"enabled": true,
"image_selection_for_annotation_policies": {
"db_topics": [
"edge_video_analytics_results"
],
"end_date": "9999-12-30T13:00:00Z",
"image_type": "annotated",
"img_handles": [],
"last_x_seconds_of_images": 0,
"start_date": "2024-01-02T00:00:00Z"
},
"task_sync_policies": {
"auto_sync_interval_in_sec": 30,
"is_auto_sync": true
}
},
"status": "COMPLETED", # task creation status
"processed_frames": 561, # net frames processed as task are being created
"total_frames": 561
}
}