Skip to main content
This guide explains how to work with datasets in Pretectum using the API. You will learn how to retrieve datasets, understand their properties, and use them effectively for organizing and searching your data.

Overview

Datasets are collections of data objects (records) that share the same schema structure. They represent the actual data stored in your Pretectum master data repository and provide a way to organize records into logical groups. The hierarchy in Pretectum is:
Business Area (e.g., Customer)
  └── Schema (e.g., Individual Customer)
       └── Dataset (e.g., US Customers, European Customers)
            └── Data Objects (actual customer records)

Why Datasets Matter

Datasets help you:
  1. Organize Data: Group records by region, source, time period, or any logical category
  2. Filter Searches: Narrow search results to specific subsets of data
  3. Track Data Quality: Monitor record counts and error rates per dataset
  4. Manage Data Lifecycle: Handle imports, exports, and deletions at the dataset level

Before You Begin

To use the Datasets API, you need:
1

Get API Credentials

Obtain your client_id and client_secret from your Pretectum tenant administrator.
2

Obtain Access Token

Exchange your credentials for an access token using the authentication endpoint.
3

Get Business Area and Schema IDs

Retrieve the business area ID using List Business Areas and the schema ID using List Schemas. Dataset queries require both IDs.

Retrieving Datasets

Datasets are accessed through their parent schema. You need both the business area ID and schema ID to list datasets.

Step 1: Get Business Area ID

async function getBusinessAreaId(accessToken, businessAreaName) {
  const response = await fetch('https://api.pretectum.io/v1/my/businessareas', {
    headers: { 'Authorization': accessToken }
  });

  const areas = await response.json();
  const area = areas.find(a => a.name === businessAreaName);
  return area?.businessAreaId;
}

const businessAreaId = await getBusinessAreaId(accessToken, 'Customer');

Step 2: Get Schema ID

async function getSchemaId(accessToken, businessAreaId, schemaName) {
  const response = await fetch(
    `https://api.pretectum.io/v1/businessareas/${businessAreaId}/schemas`,
    {
      headers: { 'Authorization': accessToken }
    }
  );

  const data = await response.json();
  const schema = data.items.find(s => s.name === schemaName);
  return schema?.schemaId;
}

const schemaId = await getSchemaId(accessToken, businessAreaId, 'Individual Customer');

Step 3: List Datasets

curl -X GET "https://api.pretectum.io/v1/businessareas/{businessAreaId}/schemas/{schemaId}/datasets" \
  -H "Authorization: your_access_token"

Understanding the Response

{
  "items": [
    {
      "dataSetId": "20240925152201042a1b2c3d4e5f6789012345678901234",
      "dataSetName": "US Customers",
      "dataSetDescription": "Customer records for United States region",
      "businessAreaId": "20240115103000123a1b2c3d4e5f6789012345678901234",
      "businessAreaName": "Customer",
      "schemaId": "20240115103000456d1e2f3a4b5c6789012345678901234",
      "schemaName": "Individual Customer",
      "recordCount": 15420,
      "erroredRecordsCount": 12,
      "runningJobsCount": 0,
      "version": 5,
      "createdByEmail": "[email protected]",
      "createdByName": "John Admin",
      "updatedByEmail": "[email protected]",
      "updatedByName": "John Admin",
      "createdDate": "2024-09-25T15:22:01.042Z",
      "updatedDate": "2024-12-15T10:30:00.000Z",
      "deleted": false
    }
  ],
  "nextPageKey": "eyJMYXN0RXZhbHVhdGVkS2V5Ijp7..."
}
FieldDescription
dataSetIdUnique identifier for the dataset
dataSetNameDisplay name used for filtering in search operations
dataSetDescriptionExplanation of the dataset’s purpose
businessAreaId / businessAreaNameParent business area
schemaId / schemaNameParent schema defining the data structure
recordCountTotal number of data objects in the dataset
erroredRecordsCountNumber of records with validation errors
runningJobsCountNumber of background jobs currently processing
versionVersion number for tracking changes
deletedWhether the dataset has been soft-deleted
nextPageKeyPagination token for fetching the next page
Once you have the list of datasets, use the dataSetName field to filter your data object searches.

Searching Within a Dataset

# Search within a specific dataset
curl -X GET "https://api.pretectum.io/dataobjects/search?query=John&dataSet=US%20Customers" \
  -H "Authorization: your_access_token"

# Combine with business area and schema filters
curl -X GET "https://api.pretectum.io/dataobjects/search?query=John&businessArea=Customer&schema=Individual%20Customer&dataSet=US%20Customers" \
  -H "Authorization: your_access_token"

Complete Client Implementation

Here is a complete implementation that handles the full hierarchy from business areas to datasets:
const API_BASE = 'https://api.pretectum.io';

class PretectumClient {
  constructor(clientId, clientSecret) {
    this.clientId = clientId;
    this.clientSecret = clientSecret;
    this.accessToken = null;
    this.tokenExpiry = null;
    this._businessAreas = null;
    this._schemas = {};
    this._datasets = {};
  }

  async authenticate() {
    const response = await fetch(`${API_BASE}/oauth2/token`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        client_id: this.clientId,
        client_secret: this.clientSecret
      })
    });

    if (!response.ok) throw new Error('Authentication failed');

    const data = await response.json();
    this.accessToken = data.access_token;
    this.tokenExpiry = Date.now() + (data.expires_in * 1000);
  }

  async ensureAuthenticated() {
    if (!this.accessToken || Date.now() >= this.tokenExpiry) {
      await this.authenticate();
    }
  }

  async getBusinessAreas() {
    await this.ensureAuthenticated();
    if (this._businessAreas) return this._businessAreas;

    const response = await fetch(`${API_BASE}/v1/my/businessareas`, {
      headers: { 'Authorization': this.accessToken }
    });
    this._businessAreas = await response.json();
    return this._businessAreas;
  }

  async getBusinessAreaId(name) {
    const areas = await this.getBusinessAreas();
    const area = areas.find(a => a.name === name);
    return area?.businessAreaId;
  }

  async getSchemas(businessAreaId) {
    await this.ensureAuthenticated();
    if (this._schemas[businessAreaId]) return this._schemas[businessAreaId];

    const allSchemas = [];
    let pageKey = null;

    do {
      const url = new URL(`${API_BASE}/v1/businessareas/${businessAreaId}/schemas`);
      if (pageKey) url.searchParams.set('pageKey', pageKey);

      const response = await fetch(url, {
        headers: { 'Authorization': this.accessToken }
      });
      const data = await response.json();
      allSchemas.push(...data.items);
      pageKey = data.nextPageKey;
    } while (pageKey);

    this._schemas[businessAreaId] = allSchemas;
    return allSchemas;
  }

  async getSchemaId(businessAreaId, schemaName) {
    const schemas = await this.getSchemas(businessAreaId);
    const schema = schemas.find(s => s.name === schemaName);
    return schema?.schemaId;
  }

  async getDatasets(businessAreaId, schemaId) {
    await this.ensureAuthenticated();
    const cacheKey = `${businessAreaId}_${schemaId}`;
    if (this._datasets[cacheKey]) return this._datasets[cacheKey];

    const allDatasets = [];
    let pageKey = null;

    do {
      const url = new URL(
        `${API_BASE}/v1/businessareas/${businessAreaId}/schemas/${schemaId}/datasets`
      );
      if (pageKey) url.searchParams.set('pageKey', pageKey);

      const response = await fetch(url, {
        headers: { 'Authorization': this.accessToken }
      });
      const data = await response.json();
      allDatasets.push(...data.items);
      pageKey = data.nextPageKey;
    } while (pageKey);

    this._datasets[cacheKey] = allDatasets;
    return allDatasets;
  }

  async getDatasetNames(businessAreaName, schemaName) {
    const businessAreaId = await this.getBusinessAreaId(businessAreaName);
    if (!businessAreaId) throw new Error(`Business area not found: ${businessAreaName}`);

    const schemaId = await this.getSchemaId(businessAreaId, schemaName);
    if (!schemaId) throw new Error(`Schema not found: ${schemaName}`);

    const datasets = await this.getDatasets(businessAreaId, schemaId);
    return datasets
      .filter(ds => !ds.deleted)
      .map(ds => ds.dataSetName);
  }

  async search(query, options = {}) {
    await this.ensureAuthenticated();

    const params = new URLSearchParams({ query, ...options });
    const response = await fetch(`${API_BASE}/dataobjects/search?${params}`, {
      headers: { 'Authorization': this.accessToken }
    });

    return response.json();
  }

  async getDatasetStats(businessAreaName, schemaName) {
    const businessAreaId = await this.getBusinessAreaId(businessAreaName);
    const schemaId = await this.getSchemaId(businessAreaId, schemaName);
    const datasets = await this.getDatasets(businessAreaId, schemaId);

    return {
      totalDatasets: datasets.filter(ds => !ds.deleted).length,
      totalRecords: datasets.reduce((sum, ds) => sum + (ds.recordCount || 0), 0),
      totalErrors: datasets.reduce((sum, ds) => sum + (ds.erroredRecordsCount || 0), 0),
      datasets: datasets.map(ds => ({
        name: ds.dataSetName,
        records: ds.recordCount,
        errors: ds.erroredRecordsCount,
        lastUpdated: ds.updatedDate
      }))
    };
  }
}

// Usage
const client = new PretectumClient('your_client_id', 'your_secret');

// Get dataset names for a dropdown
const datasetNames = await client.getDatasetNames('Customer', 'Individual Customer');
console.log('Available datasets:', datasetNames);

// Get statistics
const stats = await client.getDatasetStats('Customer', 'Individual Customer');
console.log(`Total records: ${stats.totalRecords}`);
console.log(`Total errors: ${stats.totalErrors}`);

// Search within a dataset
const results = await client.search('John', {
  businessArea: 'Customer',
  schema: 'Individual Customer',
  dataSet: 'US Customers'
});
console.log(`Found ${results.total} results`);

Building a Dataset Selector

Create a cascading filter interface for business area → schema → dataset:
import { useState, useEffect } from 'react';

function DatasetSelector({ client, onSelect }) {
  const [businessAreas, setBusinessAreas] = useState([]);
  const [schemas, setSchemas] = useState([]);
  const [datasets, setDatasets] = useState([]);
  const [selectedArea, setSelectedArea] = useState('');
  const [selectedSchema, setSelectedSchema] = useState('');
  const [selectedDataset, setSelectedDataset] = useState('');

  // Load business areas on mount
  useEffect(() => {
    client.getBusinessAreas().then(areas => {
      setBusinessAreas(areas.filter(a => a.active));
    });
  }, [client]);

  // Load schemas when business area changes
  useEffect(() => {
    if (!selectedArea) {
      setSchemas([]);
      setSelectedSchema('');
      return;
    }

    const areaId = businessAreas.find(a => a.name === selectedArea)?.businessAreaId;
    if (areaId) {
      client.getSchemas(areaId).then(schemas => {
        setSchemas(schemas.filter(s => s.active));
        setSelectedSchema('');
      });
    }
  }, [selectedArea, businessAreas, client]);

  // Load datasets when schema changes
  useEffect(() => {
    if (!selectedArea || !selectedSchema) {
      setDatasets([]);
      setSelectedDataset('');
      return;
    }

    const areaId = businessAreas.find(a => a.name === selectedArea)?.businessAreaId;
    const schemaId = schemas.find(s => s.name === selectedSchema)?.schemaId;

    if (areaId && schemaId) {
      client.getDatasets(areaId, schemaId).then(datasets => {
        setDatasets(datasets.filter(ds => !ds.deleted));
        setSelectedDataset('');
      });
    }
  }, [selectedSchema, selectedArea, schemas, businessAreas, client]);

  // Notify parent of selection change
  useEffect(() => {
    onSelect({
      businessArea: selectedArea || null,
      schema: selectedSchema || null,
      dataSet: selectedDataset || null
    });
  }, [selectedArea, selectedSchema, selectedDataset, onSelect]);

  return (
    <div className="dataset-selector">
      <select value={selectedArea} onChange={e => setSelectedArea(e.target.value)}>
        <option value="">All Business Areas</option>
        {businessAreas.map(area => (
          <option key={area.businessAreaId} value={area.name}>{area.name}</option>
        ))}
      </select>

      <select
        value={selectedSchema}
        onChange={e => setSelectedSchema(e.target.value)}
        disabled={!selectedArea}
      >
        <option value="">All Schemas</option>
        {schemas.map(schema => (
          <option key={schema.schemaId} value={schema.name}>{schema.name}</option>
        ))}
      </select>

      <select
        value={selectedDataset}
        onChange={e => setSelectedDataset(e.target.value)}
        disabled={!selectedSchema}
      >
        <option value="">All Datasets</option>
        {datasets.map(ds => (
          <option key={ds.dataSetId} value={ds.dataSetName}>
            {ds.dataSetName} ({ds.recordCount} records)
          </option>
        ))}
      </select>
    </div>
  );
}

Best Practices

Dataset metadata changes less frequently than actual data. Cache the list and refresh periodically:
const CACHE_TTL = 5 * 60 * 1000; // 5 minutes
let cachedAt = 0;

async function getCachedDatasets(businessAreaId, schemaId) {
  const cacheKey = `${businessAreaId}_${schemaId}`;
  if (Date.now() - cachedAt > CACHE_TTL) {
    client._datasets[cacheKey] = null;
    cachedAt = Date.now();
  }
  return client.getDatasets(businessAreaId, schemaId);
}
Always filter out deleted datasets in user-facing interfaces:
active_datasets = [
    ds for ds in datasets['items']
    if not ds.get('deleted', False)
]
Regularly check error counts to identify data quality issues:
const datasetsWithErrors = datasets.filter(ds => ds.erroredRecordsCount > 0);
if (datasetsWithErrors.length > 0) {
  console.warn('Datasets with errors:', datasetsWithErrors.map(ds => ds.dataSetName));
}
When filtering searches, use the dataSetName, not the dataSetId:
# Correct - use name
?dataSet=US%20Customers

# Incorrect - don't use ID
?dataSet=20240925152201042a1b2c3d4e5f6789012345678901234

Next Steps