Intel® Edge Data Collection Detailed Usage#

Contents#

Compatible UDFs #

Features like Annotation and Remote Storage provided by Intel® Edge Data Collection requires certain fields to be present in the frame metadata to operate. These fields carry annotation information, track data export status and other important details. Usually UDFs are the ones that add these fields along with any other fields the UDFs already adds as part of its functionality

By default certain sample UDFs are provided that adds these metadata fields making them Intel® Edge Data Collection compatible. In default workflow, EVAM is the direct source to publish these information to DataStore, hence any new UDFs must be added to its configuration so that the metadata have the relevant fields allowing it to be Intel® Edge Data Collection compatible.

Ensure that a compatible UDF is publishing metadata for Intel® Edge Data Collection to work.

In case you want to add your custom UDF and make it compatible with Intel® Edge Data Collection, ensure that the following fields are present in the metadata published by the UDF. Any missing key here will result in an error. Keys with sample values:

  "last_modified": time_ns()
  "export_code": 0
  "annotation_type": "auto"
  "annotations": {"objects":[]}

Details on the above fields:

  • last_modified: Mandatory. Refers to the last time the metadata was modified. The value should be in nanoseconds.

  • export_code: Mandatory. Refers to the status of the data export. The following export codes are supported:

    • 0: Data has not been exported

    • 1: Data has been exported

  • annotation_type: Mandatory. Refers to the type of annotation. The following annotation types are supported:

    • auto: for auto annotation

    • human: for human annotation

  • annotations: Mandatory. Refers to the annotations for objects detection in the image.

    • objects: A list of objects detected in the image. If present, each object should have the following keys:

      • label: The label of the object

      • score: The confidence score of the object

      • bbox: The bounding box of the object. It is list of bbox coordinates in top-left and bottom-right format. For example, [x1, y1, x2, y2]

      It is important that the objects within the annotations dictionary have bounding box dimensions in top-left and bottom-right format. Some models may have bounding box predicted as top-left coordinates, width and height. In that case, the UDF must convert it to the expected top-left and bottom-right format to avoid incorrect bounding box dimensions during data export or visualization.


  • Note

    At present, Intel® Edge Data Collection cannot publish or subscribe to more than 1 topic. Please ensure that we do not add more than 1 topic for publisher and subscriber each in the Intel® Edge Data Collection’s interface section in config.json and/or consolidated build/eii_config.json.

Human Annotation Mode #

  • Edit the /IEdgeInsights/build/.env file and add the entries that are related to CVAT as shown below

    #CVAT related config
    CVAT_SUPERUSER_NAME= # ex: CVAT_SUPERUSER_NAME=cvat_user
    CVAT_SUPERUSER_PASSWORD= # don't provide special characters yet and ensure it is atleast 9 characters long. ex: CVAT_SUPERUSER_PASSWORD=somerandom123
    # For CVAT_HOST, provide the host machine IP if the system is connected to an external network - In this case, CVAT can be accessed from any machine on the same network. If the system is not connected to an external network then provide the docker system host IP - In this case, CVAT can only be accessed from the same machine where it is running. The docker system host IP can be obtained by running the following command:
    # docker network inspect bridge -f '{{range .IPAM.Config}}{{.Gateway}}{{end}}'
    # Generally, the IP is found to be 172.17.0.1, but can vary and hence it is better to always obtain it from the above command
    CVAT_HOST=172.17.0.1
    CVAT_PORT=8443
    CVAT_HTTPS=true
    

    Note: To know more on the rules that must be followed in order for CVAT_SUPERUSER_PASSWORD to be validated successfully by django, and for more details, click here

  • Options to configure “human_annotation” in config.json

    • categories

      • Specify the categories which need to be supported in CVAT for human annotation.

      • Example: "categories":["vehicle", "Person"]

    • enabled

      • Ensure that you have set "enabled": true for “human_annotation”

      • Also ensure that you have set "enabled": false for “auto_annotation” since human annotation and auto annotation should not be enabled at the same time.

    • start_date and end_date

      • Update the above policies in YYYY-MM-DDTHH:MM:SSZ format.

      • Example: "start_date": "2022-09-02T00:00:00Z" and "end_date": "2022-09-10T00:00:00Z" if you would like to see CVAT tasks for images in image store ranging from 2022-09-02 (inclusive) to 2022-09-10 (inclusive).

      • Usage in live task creation: If the date of live feed happens to fall within the start_date and end_date range, then live tasks would be created in CVAT for images from live feed. If you would like to create live tasks forever, set "end_date": "9999-12-30T00:00:00Z". Or, if you would like to create live tasks until a particular futuristic date, you can set end_date accordingly. If you would like to create live tasks from a particular futuristic datestart_date can be set accordingly as well.

        • Note: live task creation takes into account only the date and not the time part of it i.e., THH:MM:SSZ

      • Note: Ensure that both start_date and end_date are within the retention policy of DataStore.

    • last_x_seconds_of_images

      • Specify a value in seconds in order to retrieve the last “x” seconds worth of images for creating human annotation tasks in CVAT

      • Example: A value of 60 would create human annotation task(s) in CVAT for the images that were stored in Data Store in the last 60 seconds.

      • Note:

        • Provide a value of 0 if you do not want this configuration setting to override the start_date and end_date configuration settings.

        • This configuration setting doesn’t affect live task creation

    • img_handles

      • Specify image handles for creating human annotation tasks in CVAT

      • Example: "img_handles": ["1b387bad73", "5b387bad73"] would create CVAT task(s) for the mentioned image handles

      • Note:

        • Provide a value of "img_handles": [] if you do not want this configuration setting to override the start_date, end_date and last_x_seconds_of_images configuration settings.

        • This configuration setting works regardless of whether image handles are annotated or not.

    • image_type

      • Update this policy to “annotated”, “unannotated” or “all” to create annotation tasks in CVAT for the respective images.

      • Example 1: "image_type": "annotated" would create tasks in CVAT only for images that already have annotation associated with them in influxdb. Such tasks can be termed as “review tasks” that are meant for reviewing already existing annotation (that are present due to auto annotation module or human annotation module)

      • Example 2: "image_type": "unannotated" would create tasks in CVAT only for images that do not have any annotation associated with them in influxdb. Such tasks are meant to perform annotation for the first time.

      • Example 3: "image_type": "all" would create tasks in CVAT for images regardless of whether they have annotation associated with them in influxdb. These are hybrid tasks that are meant to perform review of existing annotation for certain images and to perform fresh annotation on the rest of the images.

    • db_topics

      • The topics/tables in the data store that need human annotation tasks

    • is_auto_sync

      • Update this policy to either true or false.

      • Example 1: "is_auto_sync": true. In this case, auto sync will happen in regular intervals as specified by the auto_sync_interval_in_sec policy in the below point.

      • Example 2: "is_auto_sync": false. In this case, no annotations from CVAT will be synced with influxdb.

    • auto_sync_interval_in_sec

      • Update this policy to a value (in seconds) that defines how often you would like to poll the CVAT tasks list. This polling is necessary to update the influxdb with annotations from cvat. Note that the update to influxdb happens only when the task has been updated since the last time auto sync ran.

      • Example: "auto_sync_interval_in_sec": 60 polls the CVAT tasks list every 60 seconds and if any particular task happens to be updated since the last time auto sync ran (60 seconds ago), then the task data (annotation) is synced with influxdb.

      • Note: If there is a CVAT task that contains one or more images without annotation, then the respective CVAT job has to be marked as “completed” by clicking on Menu->Finish the job (and just clicking on “save” in the job is not enough) in order to update the annotation_type in influxdb as human for those images with empty annotation. This is necessary to have accurate annotation available for deep learning model training to avoid false positives/negatives.

        • Explanation with the help of an example: Let us say that there are 10 images in a certain CVAT task. 2 of them have annotation and the rest 8 do not. Upon clicking “save” within the CVAT job, the next time auto sync mechanism in Intel® Edge Data Collection runs, the annotations for the 2 images and their annotation type as human will be synced with influxdb. However, for the other 8 images that do not have any annotation, the empty annotation and the annotation type as human will be updated as human in influxdb by auto sync mechanism only upon completion (and not just “save”) of the CVAT job.

    • max_images_per_task

      • Update this policy to set a limit on maximum images that a CVAT task can hold

      • Example: "max_images_per_task": 100. This would mean that every task that gets created in CVAT can hold a maximum of 100 images. The actual amount of images that goes into a particular task is decided by the intersection with other policies: start_date, end_date and image_type

      • Note: Set this policy to 0 if you do not want any tasks to be created.

    • max_images_per_job

      • Update this policy to set a limit on maximum images that a CVAT job in a given task can hold

      • Example: "max_images_per_job": 1. This would mean that every job that gets created in CVAT can hold a maximum of 1 image.

      • Note: Setting this policy to a value greater than 1 would mean that, when a given job is saved, all images in the job, during auto sync, gets synced back to DataStore as ‘human’ annotated even though all those images might not have had modified annotation by the user.

    • max_images_per_live_task

      • Update this policy to set a limit on maximum images that a live task in CVAT can hold. The images in this case come from live feed. Consider giving this policy a lower number if you face frame drop issues.

      • Example: "max_images_per_live_task": 50

      • Note: Set this policy to 0 if you do not want any live tasks to be created.

    • login_retry_limit

      • Update this policy to a reasonable number which decides how many times login to CVAT will be attempted by Intel® Edge Data Collection before giving up.

      • Example: "login_retry_limit": 5

  • Note: After running run.sh, open the CVAT web application by navigating to https://<host_machine_IP>:8443 in google chrome and login using the superuser credentials you provided in the .env file. Perform annotation as necessary for the created tasks. Annotators within the team could create accounts for themselves using the CVAT web app and perform annotations as well for the tasks that the superuser assigned to them.

  • IMPORTANT

    • Ensure not to logout of CVAT in browser as it causes issues in interaction with Intel® Edge Data Collection. In case you do logout, after you log back in, restart the Intel® Edge Data Collection container.

    • Google Chrome is the only browser that is supported by CVAT. For more details, see link


Remote Storage #

  • Intel® Edge Data Collection provides optional remote storage feature that allows user to export object detection data or classification data in MS-COCO and imagenet data formats respectively to a persistent file storage. The file storage could be a local disk or a network mounted volume as set in the environment DCAAS_STORAGE_DIR in IEdgeInsights/build/.env file. If one has valid azure storage credentials, they also make use of uploading the exact data to azure blob storage service. The data export feature can be run periodically (autosync) or as a one-time(snapshot) activity. Read further below for more.

    Export behavior

    Depending upon the image_recognition type chosen, the exported directory structure can vary. For both, topic wise parent directories are created within which data is exported.

    • Classification

      For classification type of export, inside the topic directory, a class wise folder structure is created containing images tagged to that particular class. This particular method of segregating images is also known as ImageNet data format for classification. Before export to relevant label directory, we identify the type of export whether it is a binary class export or multi-class export. We make use of the labels present in category list in Intel® Edge Data Collection’s config.json for this. The directory assignment logic for export is as follows.

      • If only one category is present in category list, the export is considered as a binary class export,

        • For an img_handle, if the label(from category list) is found to be NEGATIVE class (i.e.<threshold), it is moved into non_<label> dir

        • For an img_handle, if the label(from category list) is found to be POSITIVE class (i.e.>=threshold), it is moved into its <label> dir

      • However, if multiple categories are present in category list, the export is considered as multi-class export:

        • For an img_handle, if all the labels found in db are all NEGATIVE class(i.e.<threshold), it is moved to no_class dir

        • For an img_handle, any/all labels are found to be POSITIVE class(i.e.>=threshold), it is copied to their respective <label> dir

        For example, for the following metadata in influx for the table edge_video_analytics_results,

        img_handle

        cat

        dog

        person

        A

        0

        0

        0

        B

        -

        -

        1

        C

        0

        1

        0

        D

        -

        1

        1

        • if category list has [cat, dog, person],

            └── edge_video_analytics_results
                ├── dog
                   ├── C.jpg
                   └── D.jpg
                ├── no_class
                   └── A.jpg
                └── person
                    ├── B.jpg
                    └── D.jpg
        
        • if category list has only [person],

          └── edge_video_analytics_results
              ├── non_person
                 ├── A.jpg
                 └── C.jpg
              └── person
                  ├── B.jpg
                  └── D.jpg
        
    • Object Detection

      For object detection type of export, data format used is MS-COCO. Similar to classification export, data is grouped inside a topic directory. The data is segregated into day-wise sub-directory depending on their ingestion time into influx (YYYYMMDD format). Inside it, the data in COCO format exported. See sample structure below.

          └── edge_video_analytics_results
              ├── 20241702
                 ├── annotations
                    └── instances_20241702.json
                 └── images
                     ├── x.jpg
                     └── y.jpg
              └── 20241703
                  ├── annotations
                     └── instances_20241703.json
                  └── images
                      ├── a.jpg
                      ├── b.jpg
                      └── c.jpg
      
    • Note if data is exported with snapshot mode on, the above exported data will be kept in a separate directory with folder name as time of run to distinguish between other snapshot exports, if any.

    Configuration

    Remote storage provides following options to configure in config.json

      "storage":{
          "mode": 0,  # 0,1,2
          "type":"disk",  # or 'azure'
          "auto_sync_interval_in_sec": 30, 
          "ingestion_fps": 9,
          "db_topics": ["edge_video_analytics_results"]
          "img_recognition_configs":[
            {
              "type": "classification", # or 'object_detection'
              "export_format": "imagenet",
              "classification_threshold": {"anomalous": 0.5},
              "filter": {
                "annotation_type": ["auto", "human"],
                "start_date": "2023-06-06T00:00:00Z",
                "end_date": "2024-12-30T13:00:00Z"
                }
            }
          ]
      }
    
    • mode

      • set it to 0 if you wish to turn off remote storage feature. That would make other options in the storage config irrelevant. The following modes are supported:

        • 0: turn off data export functionality

        • 1: snapshot mode: a one time data export based on the img recognition config options provided.

        • 2: autosync mode: a periodic data export based on the img recognition config options provided.

    • type

      • disk

        • when set to disk, the target directory is a disk path (can be host or network mounted). the path is set by DCAAS_LOCAL_STORAGE_DIR in /IEdgeInsights/build/.env file. Ensure that the path has necessary write permission for Intel® Edge Data Collection user. Read further on how to do that

      • azure

        • when set to azure, the target directory is an azure storage container. User must have a valid azure storage subscription before being able to run remote storage in azure mode. Following additional account keys and azure storage container path needs to be added as environment variable to the build .env file besides DCAAS_LOCAL_STORAGE_DIR. See Additional Info section on how to retrieve these values from Azure.

          AZURE_STORAGE_CONNECTION_STRING = # <storage account connection string>
          AZURE_STORAGE_CONTAINER = # <container or directory in azure cloud where data will be uploaded>
          

        NOTE: Regardless of storage type used - disk or azure, the DCAAS_LOCAL_STORAGE_DIR must be set. When storage type is disk, this is the target directory whereas, when storage type is azure, a sub-directory called tmp is created inside this path and is used as a temporary directory for caching local changes before being uploaded to azure. The contents stored in the temporary directory are not guaranteed to persist unlike in disk storage type.

    • auto_sync_interval_in_sec

      • time interval (in sec) at which auto-sync runs

    • ingestion_fps

      • An estimated number of frames being published per second. defaults to 9. This helps remote storage algorithm to adjust/limit the number of rows to fetch from DataStore microservice during auto sync. Higher value will increase the the max number of rows to be fetched upon every query at a possible larger response time and vice-versa. Cannot be 0 or negative

    • db_topics

      • List of topics/tables in the data store to be queried by the remote storage. Needs at least 1 topic if storage mode is set to 1 or 2

    • img_recognition_configs

      • export sub config related to specific image recognition type. For individual image recognition configurations, see README.md.

    Permissions for storage dir

    If remote storage is enabled in config, set the target path for storage and enable permissions as follows. In the .env file, set DCAAS_STORAGE_DIR to the directory where you wish to remotely save the annotations and images (ex: /path/to/persistent/remote/storage). As a better alternative to sudo chmod 777 /path/to/persistent/remote/storage (since this allows anyone to read/write the data), please ensure that write permissions are given to the persistent remote storage path for the user mentioned in env variable EII_USER which defaults to eiiuser.


Configuring other options #

Configuring “options” in config.json

  • input_queue

    • Stores raw frames from video ingestion.

  • data_filter_queue

    • Stores frames from data filter.

  • output_queue

    • Stores processed frames that are ready to be published.

  • max_size for input_queue, data_filter_queue and output_queue.

    • Sets the size of the mentioned queues.

    • Consider giving it a higher value if you face frame drop issues.

    • If value is <=0, queue size is infinite.


Status Notification #

Intel® Edge Data Collection exports statuses of task creation and data export progress for snapshots in the form of a json file. For both progress in terms of no of images processed is continuously updated and status is set to COMPLETED or IN_PROGRESS depending upon whether all the frames have been processed or not.

The notification file can be found in ${EII_INSTALL_PATH}/data/dcaas/dcaas_notification.json

Here is a sample of notification json file (comments added for explanation).

{
    "services": [   # list of enabled services with status
        "remote_storage",
        "human_annotation"
    ],
    "remote_storage": {   # remote storage specific status
        "snapshot_list": [
            "2024-02-14-193903"   # list of snapshot. currently only 1 i.e.latest is mentioned
        ],
        "2024-02-14-193903": {    # snapshot instance details
            "time_utc": "2024-02-14T19:39:03",
            "config": {
                "mode": 1,
                "device": "disk",
                "auto_sync_interval_in_sec": 30,
                "ingestion_fps": 9,
                "img_recognition_configs": [
                    {
                        "type": "classification",
                        "export_format": "imagenet",
                        "filter": {
                            "start_date": "2023-06-06T00:00:00Z",
                            "end_date": "2024-12-30T13:00:00Z",
                            "annotation_type": [
                                "auto",
                                "human"
                            ]
                        },
                        "classification_threshold": {
                            "anomalous": 0.4
                        }
                    }
                ],
                "db_topics": [
                    "edge_video_analytics_results"
                ]
            },
            "status": "IN_PROGRESS",    # export status, 'COMPLETED' when done
            "export_dir": "/storage/2024-02-14-193903",   # exported relative dir
            "processed_frames": 200,    # frames exported till now
            "total_frames": 402
        }
    },
    "human_annotation": {   # human annotation specific status
        "time_utc": "2024-02-14T19:39:22.443201Z",
        "config": {
            "cvat": {
                "login_retry_limit": 10,
                "task_creation_policies": {
                    "max_images_per_job": 5,
                    "max_images_per_live_task": 5,
                    "max_images_per_task": 100
                }
            },
            "enabled": true,
            "image_selection_for_annotation_policies": {
                "db_topics": [
                    "edge_video_analytics_results"
                ],
                "end_date": "9999-12-30T13:00:00Z",
                "image_type": "annotated",
                "img_handles": [],
                "last_x_seconds_of_images": 0,
                "start_date": "2024-01-02T00:00:00Z"
            },
            "task_sync_policies": {
                "auto_sync_interval_in_sec": 30,
                "is_auto_sync": true
            }
        },
        "status": "COMPLETED",    # task creation status
        "processed_frames": 561,    # net frames processed as task are being created
        "total_frames": 561
    }
}