group-stats-event-schemas

Centralized even schema repo for group stats

Usage no npm install needed!

<script type="module">
  import groupStatsEventSchemas from 'https://cdn.skypack.dev/group-stats-event-schemas';
</script>

README

Event schemas

This repo contains all the schemas used across Domain, (currently only group-stats events) under the schemas folder.

  • Each Schema has a versioned schema (JSON Schema) and an accompanying samples file under a folder with its name
  • Changes in schemas should be made using a new version following semver guidelines. Following release types are supported.
    • patch: if it is fixing an existing validation issue. This is a non breaking change.
    • minor: if it is adding new properties. This is a non breaking change
    • major: if it is making a breaking change.
  • Any new PR should be based against stage first and then to master. Build will fail if you do otherwise.
  • master branch will be used in production and stage branch for staging environment.
  • Merge to stage or master makes the schemas available via the following http endpoints immediately (if the builds pass) |Endpoint |Staging |Production |Note | | --- |--- |--- |--- | | GET Schema | https://stage-event-schemas.domain.com.au/v1/group-stats/AdvertView/2.0.6 |https://event-schemas.domain.com.au/v1/group-stats/AdvertView/2.0.6 | Use this endpoint to retrieve a published schema for a given schema key and version| | GET Meta | https://stage-event-schemas.domain.com.au/v1/group-stats/AdvertView/meta |https://event-schemas.domain.com.au/v1/group-stats/AdvertView/meta | Use this endpoint to retrieve meta info for a given schema key |

How to add a new schema or update an existing schema ?

Prerequisites (Local dev setup)

Steps to add/update a schema

  • git clone the repo to a local directory in your machine
  • cd into the local dir
  • Make a new branch for your changes based off the stage branch.
  • Run a yarn install command on the root directory of the locally cloned repo (dir)
  • Run yarn run new group-stats/YourEventType to add a new event YourEventType to group-stats dir with a default minimal schema, samples README and a meta file. If YourEventType already exists it will copy the latest version and create a new version of the schema and sample inside the same directory. Additionaly you can pass in new release type (patch, minor, major).
  • The meta (yaml) file can be used to provide meta information
    • teamName : Provide your team name. if multiple teams, make it an array
    • deprecatedVersions : If you decide to deprecate any versions, add to the array here. Keep in mind that not all event producers may be using the same versions. Some event producers could be using an older version of the event schema.
    • criticality: One of high, medium, low . Talk to the Data Activation team to find out if your event is critical. Citical events have higher monitoring and stricter review process.

Test your changes

  • yarn check-schemas: Run checks against all schemas
  • yarn check-schemas:single group-stats/AdvertView: Run checks against a single schemas key. This is only for local testing to test your schemas quickly. Replace group-stats/AdvertView with your schema key.

Submit your changes for review

  • Commit and push your changes.
  • Create a PR based against the stage branch, title it YourEventType@Version and let us know at #event-schemas-prs

Event versions

All event schemas are versioned and once published cannot be changed. Any change needs to be done by adding a new version. All the 1.*.* versions of group stats event schemas are auto generated. So far, all events were using either 2.0 or v2.0 in the EventVersion property of their payload. So they are mapped to the latest 1.*.* version. All new schemas added manually should begin from 2.0.1 version and used in the EventVersion property of the event payload.

Semver ranges in EventVersion

While sending events to group stats, you can make use of semver range format to target schema versions. For e.g. you can use ^2, ~2 etc. instead of specifying a fixed schema version. Group stats will resolve to the max satisfying version which satisfies the given range format.

Testing your schemas on stage

Once your PR is merged to stage, your schema is available on stage for you to test. You can send events to group stats on stage environment using the new version and verify that it works as expected.

  • Produce some events from your service on stage.
  • Or if you dont have a service setup already, produce events manually
{
    "Data": {
        "ClientType": "Website - Desktop",
        "EventGeneratedTimestamp": 1589256320255,
        "EventType": "SurveyResponded",<< THIS IS YOUR EVENTTYPE
        "EventVersion": "^2", << THIS IS YOUR NEW VERSION, you can use semver range format or use a specific version
        "MetaData": {
            "Context": "search-result",
            "EventProvider": "domain-survey-api",
            "GAClientId": "420940774.1573787940",
            "RespondentId": "b74d52aa-5121-4fb0-8606-bd5bfbbf0aa4",
            "SurveyId": "sur_1",
            "SurveyItemId": "ite_sc_first_home",
            "SurveyItemOptionId": "opt_1",
            "SurveyVersion": "v0",
            "UserToken": "b74d52aa-5121-4fb0-8606-bd5bfbbf0aa4"
        }
    }
}

Use the kibana links below to debug your events

Staging Production Note
Schema Found Schema Found Use this to see logs when a schema is found.
Schema Not Found Schema Not Found Use this to see logs when a schema is not found.
Schema Version Not Found Schema Version Not Found Use this to see logs when a schema versinon is not found.
Validation Failed Validation Failed Use this to see logs by EventType/EventProvider when an event fails validation
Dashboard Dashboard Combines all above to a single dashboard

NOTE: Events that pass validation are not logged so you will not find them in logs. Also Validation Failed logs are not logged every time (logged only frequently by a given factor) and are throttled (not logged) if they go beyond a certain rate per second. Other logs are also not logged everytime as they are cached.

Know issue: misleading additionalProperties validation error

You may notice errors such as #/ClientType: #/additionalProperties/$false: #/ClientType (All values fail against the false schema). This usually means that you have set additionalProperties to false but provided properties that are not part of the schema definition. However, there is a known issue where you may notice this error even when there are no additional properties. This happens only when the validation fails due to some other reason. If you fix the others first this error goes away on its own.

Case sensitivity

Normally, event types are case insensitive, however, it is recommended to use the same casing, usually PasacalCasing everytime you use an event type.

Definitions

If you have multiple schemas that share some common sub schemas or have a common base schema then you can make use of definitions to reuse such schemas. For e.g. GAClientId sub schema is repeated in most of the schemas and therefore is $refed by the schemas from the definitions directory. These definitions are also versioned. This will save you time by not repeating the same definition on all your events. Also there are automation around auto updating schemas to the latest definitions.

Event validation architechture

See the confluence page

Want to add schemas for GA/MixPanel ?

Drop a message at #customer-data-platform to discuss. New groups can be added by creating another folder under the schemas folder and following the same pattern.

CI/CD

Builds run on jenkins see the Jenkinsfile. Also, uses CloudFormation to create/update the stack, see cloudformation.yml file.

DR

  • Change of region : Change the AWS_REGION env variable in the Jenkins file to the new region

Architechture

Architechture