Accessing data exported from your instance

For controlling or compliance purposes, you may want to see the exact data that has been shared from your Enterprise instance. All data shared from the instance ends up in an S3 bucket located in an AWS account owned by Gitpod. See the Data and Observability for more information on the observability architecture.

Accessing the Data Shared

Customers can access the S3 bucket where the data is stored from any role/user in the Gitpod Instance’s AWS account by following the following steps:

Upon request, your Gitpod account manager can give you the name of the S3 bucket where the data from your instance is sent to.
Set up the AWS CLI environment to assume any role or user in the AWS account where Gitpod is installed into. For example, whatever user or role used to apply the CloudFormation template to install Gitpod can be used.
```
# e.g. if they're using env variables
export AWS_SECRET_ACCESS_KEY=""
export AWS_ACCESS_KEY_ID=""
export AWS_SESSION_TOKEN=""
```

Use the CLI to inspect the data

# List the bucket
aws s3 ls <bucket-name-provided-by-gitpod>
# Download a specific file
aws s3 cp s3://<bucket-name-provided-by-gitpod>/k8s-state/meta/2023/06/03/23/k8s-state-meta-1-2023-06-03-23-59-21-12dab8f5-0d40-4069-a679-172f94f13304 kubstate-example.json

The storage format depends on the telemetry type. For non-metrics data, the files can be directly inspected. For metrics data, see the instructions below.

Accessing Metrics Data

Accessing and Decoding Prometheus Metrics

Prometheus metrics data is stored in protobuf format, which requires special handling to decode and inspect. The following steps will help you access, decode, and search through Prometheus metrics data:

Download the prometheus-decoder tool from GitHub:

curl -L -o prometheus-decoder https://github.com/gitpod-io/prometheus-decoder/releases/download/v0.1.0/prometheus-decoder-linux-amd64
chmod +x prometheus-decoder

Create a directory to store the metrics data:
```
mkdir -p metrics-data
```
Sync the metrics data from the S3 bucket to your local directory. Replace the path with the appropriate date path for the metrics you want to analyze:
```
aws s3 sync s3://<bucket-name-provided-by-gitpod>/telemetry/metrics/YYYY/MM/DD/ ./metrics-data
```

Use the prometheus-decoder tool to decode the metrics files and search for specific metrics:

# Decode a single file
./prometheus-decoder -input ./metrics-data/filename

# Decode all files and search for a specific metric
find ./metrics-data -type f -exec ./prometheus-decoder -input {} \; | grep 'metric_name' -A 20 -B 2 > ./results.txt

The results will be in JSON format, making it easier to read and analyze the metrics data.

Example: To search for the gitpod_scm_token_refresh_requests_total metric:

find ./metrics-data -type f -exec ./prometheus-decoder -input {} \; | grep 'gitpod_scm_token_refresh_requests_total' -A 20 -B 2 > ./results.txt

This will find all instances of the metric in the decoded data, include 2 lines before and 20 lines after each match, and save the results to a file called results.txt.

Escalation Process for Data Leaks

In case any data is found in the S3 bucket that contains personally identifiable or confidential information that should not have been leaked, the process for notifying Gitpod and remediating the issue is as follows:

Customer can access data to identify potentially sensitive data leaks: Customers are able to inspect any data that was sent to Gitpod by gaining access to the S3 bucket where all data from an instance is sent to (see “Accessing the Data Shared” above).
Customer informs of data leak: Upon identification of confidential data leakage, a customer can trigger security incident via their Gitpod account manager.
Data is deleted: The data that was “leaked” is identified and measures are taken to delete it in S3 and then further in any third party systems.
- For S3 there is the option to delete the entire bucket. In any case, the data in this bucket is configured to have a very short retention. See Observability and Data.
- If the effort is deemed worthwhile, the data can also be deleted individually
- For 3rd party services, details will depend on the service and the data that was leaked.
Improvements made: The root cause of why the data leaked is identified, and measures are put in place to prevent this from occurring again.

Introduction

Configure

References

Integrations

Enterprise

Help

Accessing exported instance data