Skip to content
Methodology · Metodologi

How Sumber treats data — licence, vintage, citation, language.

A five-minute reading, the same in English and Bahasa. Each section answers one question custodians and seekers tend to ask before they trust a number.

Version
1.0 · 2026-05
Audience
Custodians, seekers, agents
Language
EN · ID, side-by-side
I / LICENCE

Sumber hosts; Sumber does not own. Custodian attribution is preserved on every series.

EN

We host. We do not own.

Every series on Sumber is, and remains, the work of its custodian — BPS, Bank Indonesia, OJK, IDX, PIHPS, a provincial portal, or another public-data agency. Sumber’s job is to make these series queryable in plain language, not to claim them, restate them, or replace the original record.

Where a custodian’s licence permits redistribution, Sumber redistributes. Where it permits only display, Sumber displays without storing a downstream copy. Where the licence requires citation, Sumber cites by default — the citation row beneath every figure is non-removable.

ID

Kami menampung. Kami tidak memiliki.

Setiap seri di Sumber adalah, dan tetap, karya penjaga datanya — BPS, Bank Indonesia, OJK, IDX, PIHPS, portal provinsi, atau lembaga data publik lainnya. Tugas Sumber adalah membuat seri-seri ini dapat di-query dalam bahasa biasa, bukan mengklaimnya, menyatakan ulang, atau menggantikan catatan aslinya.

Ketika lisensi penjaga data mengizinkan redistribusi, Sumber meredistribusikan. Ketika hanya mengizinkan tampilan, Sumber menampilkan tanpa menyimpan salinan turunan. Ketika lisensi mengharuskan sitasi, Sumber memberi sitasi secara default — baris sitasi di bawah setiap angka tidak dapat dihapus.

LICENCE INDEX

Each custodian’s licence terms are linked from the corresponding row in /coverage. The table below maps custodian to licence type and Sumber’s mode of use.

CustodianSeriesLicence typeSumber mode
BPS2,073Public, attribution requiredRedistribute with attribution
IDX225Public, member-of-recordDisplay + cite, no raw redistribution
BI68Public domain (policy data)Redistribute with attribution
PIHPS10Public, attribution requiredRedistribute with attribution

If you are a custodian and want to amend, restrict, or expand Sumber’s mode of use for your series, write to partners@sumber.io. We adjust within ten business days and post a revision note in the changelog.

II / VINTAGE & VALIDATION

What vintage 2026-04 means, and why we never overwrite a number.

EN

Vintage is the date the number was published — not the date it describes.

Every figure carries a vintage_date — the day the custodian published it. The primary key of a data row is the triple (series_id, period, vintage_date), so a revision never overwrites the prior value: it lands as a new row. The old number stays reachable; it is just no longer the latest. The series_data_latest view is the default read path.

This matters because analysts often need the number as it was known on day X, not as it is known now. Citing a vintage protects you from quoting a figure BPS has since revised down 0.4 points.

ID

Vintage adalah tanggal angka diterbitkan — bukan tanggal yang dijelaskan.

Setiap angka membawa vintage_date — hari penjaga data menerbitkannya. Kunci primer satu baris data adalah triplet (series_id, period, vintage_date), sehingga revisi tidak pernah menimpa nilai sebelumnya: ia hadir sebagai baris baru. Angka lama tetap dapat diakses; hanya saja bukan lagi yang terbaru. View series_data_latest adalah jalur baca default.

Ini penting karena analis sering memerlukan angka seperti yang diketahui pada hari X, bukan yang diketahui sekarang. Mensitasi vintage melindungi Anda dari mengutip angka yang sejak itu direvisi turun 0,4 poin.

VALIDATION CLASSES

Every scraper imports rules from a shared validator set. Each rule is one of three classes, which decide what happens when an incoming value fails it:

ClassOn failureExample
HARDRow is quarantined — never enters the catalogue until a human clears it.CPI outside a plausible domain bound.
SOFTLogged as a warning; the row still publishes.A value near, but inside, a bound.
DELTARevision logged; alerts fire when the change exceeds a per-series threshold.A new vintage moves a figure by > 1 point.

Separately, each run computes a fingerprint(raw) of the source payload. If the upstream format drifts, the fingerprint changes and the run is quarantined rather than mis-parsed.

METHODOLOGY VERSIONING

When a custodian rebases a series — BPS moving CPI from a 2018 to a 2024 basket, for example — Sumber does not silently splice the two together. The rebase is recorded in a methodology_versions entry, and figures carry the version that produced them, so a chart spanning a rebase is explicit about the discontinuity.

GEO · CURRENCY · TIME
  • Geo — coded with BPS administrative codes (national id, provinsi 2-digit, kabupaten 4-digit), hierarchical via a geo_parent link.
  • Currency — stored in native currency. Conversion happens at runtime via JISDOR; Sumber never silently normalises a stored value.
  • Timeperiod and vintage_date are timezone-naive WIB dates; ingested_at and last_updated are UTC timestamps.

To pin a vintage in an API call: GET /v1/series/{slug}?vintage=2024-09-30 returns the most recent revision dated on or before that date. See /api.

III / CITATION FORMAT

One citation format. Used everywhere, embeddable everywhere.

THE SLUG

A Sumber slug is a dot-delimited path that uniquely identifies a series. It is human-readable, machine-stable, and never recycled — if a series is retired, its slug retires with it.

Format: source.domain.indicator.geo.frequency (with an optional trailing qualifier).

  • source — the originating agency, lowercase (bps, bi, idx, pihps).
  • domain — the broad topic (cpi, fx, eqty).
  • indicator — the specific measure (headline, usd, bbca.close).
  • geo · frequency — geo scope then a single frequency letter (d daily, w weekly, m monthly, q quarterly, a annual).

Examples in the wild:

bps.cpi.headline.id.m       // CPI headline, Indonesia, monthly
bi.fx.usd.id.d              // JISDOR USD/IDR reference rate, daily
idx.eqty.bbca.close.id.d    // BBCA close price, daily
pihps.food.rice.id.w        // Rice price, Indonesia, weekly
FULL CITATION

Wherever Sumber surfaces a figure as authoritative — on the home page, in /coverage, in an OG card, in an API response — a citation row appears immediately beneath it:

CUSTODIAN   slug.with.dots   vintage YYYY-MM-DD

Embeddable form for journalists and academic citation:

// HTML
<cite>BPS, bps.cpi.headline.id.m, accessed via sumber.io, v.2026-05-01</cite>

// One-line, for a footnote or chart subtitle
Source: BPS · bps.cpi.headline.id.m · vintage 2026-05
IV / BILINGUAL POLICY

When and why Bahasa appears inside English copy, and the lang-attribute discipline.

EN

Code-switch with attribution, not by accident.

Every series carries both an English and a Bahasa label — both are NOT NULL in the schema. Indonesian terms appear inside English copy when they are terms of art an Indonesian-news reader would expect untranslated: kurs for exchange rate, IHSG for the composite index. The reverse holds when Bahasa is the lead language.

Every code-switch is wrapped in <span lang="id"> (or lang="en") so screen readers pronounce it correctly. AI-translated labels start at ai_generated quality and are promoted by a human via the admin dashboard.

ID

Alih kode dengan atribusi, bukan secara tidak sengaja.

Setiap seri membawa label Inggris dan Bahasa — keduanya NOT NULL di skema. Istilah Inggris muncul di dalam teks Bahasa ketika merupakan terms of art yang ditemui pembaca berita tanpa diterjemahkan: dividend, ticker. Sebaliknya juga berlaku ketika EN menjadi bahasa utama.

Setiap alih kode dibungkus dalam <span lang="id"> (atau lang="en") sehingga pembaca layar mengucapkannya dengan benar. Label hasil terjemahan AI dimulai pada kualitas ai_generated dan dipromosikan oleh manusia melalui dasbor admin.

LEAD LANGUAGE

The lead language for a given visitor follows three rules in order:

  1. If the visitor has set a preference, use it (stored in localStorage.sumber.lang).
  2. Otherwise, if their most recent query was detected as Bahasa Indonesia, use ID.
  3. Otherwise, default to English.

Detection is deliberately conservative. A flip in lead language is always recoverable via the EN / ID toggle, and is announced via aria-live.

Sumber.

The Indonesian public data utility. Bilingual, AI-native, vintage-aware. In early preview from Jakarta — coverage expanding through 2026.

Every query you ask is saved anonymously to prioritise what we ingest next. Privacy.

Get in touch
For agents

Sumber is built for agents to cite — talk to us about programmatic access.

LiveEarly preview · open to questionsSumber · 2026 · Built in Jakarta