Skip to content

Data Export

Introduction

Scarf provides a robust platform for tracking package downloads and pixel views. The ability to export this data is crucial for analytics, reporting, and integrating with other tools. This guide aims to provide a clear and concise explanation of how to export data from Scarf, what data is exported, and how to make use of any available integrations.

Prerequisites

Exporting data from Scarf will only work if you are on a Scarf Paid Plan.

How to Export Event Data

Scarf Dashboard

To export data out of Scarf,

  1. go to the main dashboard and
  2. click "Export packages data".

This will export all data, for the default period, over the past month.

Export packages data

The data you can export from Scarf includes all events (defined as package downloads and pixel views) from every user that has interacted with your Scarf-enabled artifacts (packages and pixels). Upon clicking "Export packages data", this data will download as a .csv file.

Scarf API

You can also export this data using the Scarf API.

What Data is Exported

Export Fields

The event data export includes the following data fields

name type description
id text This uniquely identifies the event (pixel view or package download) that occurred.
type text This categorizes the type of event that occurred (e.g. pixel-fetch, manifest-fetch, binary-download, etc.).
package text For Scarf package downloads, this specifies which package has been downloaded.
version text For Scarf package downloads, this specifies which version of the package has been downloaded.
time timestamp This refers to the time in UTC that the event occurred.
referer text For Scarf pixel views, this refers to the page that was viewed.
user_agent text This refers to the User-Agent, which provides information around the method of installation, often including information such as operating system, device, browser, architecture, and client.
variables text This refers to any custom-specified variables that you might use Scarf to track in file package downloads.
origin_id text This uniquely identifies the user (through a specific device) who has interacted with a Scarf event.
origin_latitude numeric This is the latitude of the location Scarf is able to identify for the event.
origin_longitude numeric This is the longitude of the location Scarf is able to identify for the event.
origin_country text This is the country of the location Scarf is able to identify for the event.
origin_city text This is the city of the location Scarf is able to identify for the event.
origin_state text This is the state of the location Scarf is able to identify for the event.
origin_postal text This is the postal code (ZIP code, in the US) of the location Scarf is able to identify for the event.
origin_connection_type text This categorizes the type of IP address Scarf is able to identify (e.g. business, isp, hosting, etc.).
origin_company text If Scarf is able to associate the event with a known business entity, that business entity is listed here.
origin_domain text If Scarf is able to associate the event with a known business entity, that business entity's web domain address is listed here.
dnt boolean If the user includes a DNT request in their header, that is logged here and they will not be tracked.
confidence numeric The probability of correct identification of the data.
endpoint_id text This uniquely identifies the public-facing device that has interacted with a Scarf event. Unlike origin_id, it is notably not sensitive to changes in device information like client, user agent, etc.

How to Export Aggregate Data

The documentation for exporting aggregates can be found in Export aggregates. Here's an example curl request to download aggregate data. The output is newline delimited json.

curl -o {filename}.jsonl \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/x-ndjson" \
  "https://api.scarf.sh/v2/packages/{owner}/aggregates?start_date={start_date}&end_date={end_date}&breakdown=by-company"

How to Export Company Scarf Scores

The documentation for exporting company Scarf Score can be found in Export Company Scarf Scores. Here's an example curl request to download scarf score data.

curl -o scarf-score.csv \
    -H "Authorization: Bearer {token} \
    -H "Content-Type: text/csv" \
    "https://api.scarf.sh/v2/companies/{owner}/scoring"

The Scarf Score export includes the following data fields:

name type description
company text Name of the company
company_domain text Domain of the company. Eg. company.com
scarf_score numeric Scarf's score for a company
funnel_stage text Stage of a company's journey in using your software
first_seen text Date when a company was first seen
last_seen text Date when a company was most recently seen.

How to Export Company Events

The documentation for exporting company events can be found in Export Company Events. Here's and exampe curl request to download company events data.

curl -o company-events.csv \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: text/csv" \ 
  "https://api.scarf.sh/v2/companies/{owner}/{domain}/events?start_date={start_date}&end_date={end_date}"
The fields for this export can be found here

Integrations

Scarf to PostgreSQL

GitHub: https://github.com/scarf-sh/scarf-postgres-exporter

Overview

The Scarf to PostgreSQL Exporter is a script designed to pull down your raw Scarf data and send it into a PostgreSQL database. This script is intended to be run as a daily batch job. It provides an automated way to backfill and update your PostgreSQL database with Scarf's enhanced data.

Prerequisites

  • psql must be installed and available in your environment.
  • You will need an account on the Scarf Dashboard.
  • You will need a Scarf API access token. You can generate an API Token from the settings in your Scarf Dashboard account.

Settings

The following environment variables are required:

  • SCARF_API_TOKEN: Your Scarf API access token.
  • SCARF_ENTITY_NAME: Your Scarf username or the name of your organization.
  • PSQL_CONN_STRING: The PostgreSQL connection string.

optional

  • BACKFILL_DAYS: Number of days to backfill data. Defaults to 31 if not set.

Getting Started

For more details, you can visit the GitHub repository.

Future Integrations

Integrations are in development, if you have particular data sources you'd like Scarf to integrate with, we'd love to hear from you.