National Center Biobank Network (NCBN) dataset

Summary

The NCBN dataset consists of allele and genotype frequencies obtained through whole-genome sequence (WGS) analysis of control individuals for cancer and rare disease studies, collected by the National Center BioBank Network (NCBN). Regional variations were taken into consideration. The dataset includes frequencies for the mainland Japanese population and the Ryukyu population, determined through PCA analysis. Variants that were specific to or not observed in the Japanese population were identified by joint calling with WGS data from the 1000 Genomes Project (1KGP). See the publication below for details.

  • Version/Last update: 2024/6/28
  • Sample size: 9,290 (Japanese), 2,504 (1000 Genomes Project)
  • Number of variants:215,729,032 (Total of Japanese and 1000 Genomes Project)

Publication

Kawai Y, Watanabe Y, Omae Y, Miyahara R, Khor S-S, Noiri E, et al. (2023) Exploring the genetic diversity of the Japanese population: Insights from a large-scale whole genome sequencing analysis. PLoS Genet 19(12): e1010625. https://doi.org/10.1371/journal.pgen.1010625.

Terms of use

Rights of Data Users

The rights of data users shall conform to "5-2-1. Open Data" in "5-2. Rights of Data Users" listed in the NBDC Human Data Sharing Guidelines.

  1. The data user can freely present the result of the study for which data from the NBDC Human Database are used.
  2. The data user can freely acquire intellectual property rights based on the result of the study for which data from the NBDC Human Database are used.

Responsibilities of Data Users

Terms of "5-3-1. Open Data" in "5-3. Responsibilities of Data Users" listed in the NBDC Human Data Sharing Guidelines shall apply with modification to the responsibilities of data users. As for redistribution of data, terms for controlled-access data shall apply because this dataset was generated by processing controlled-access data.

  1. In using data, the user must take responsibility for and make judgments concerning the quality, content, and scientific validity of the data.
  2. The data user must comply with the following rules.
    • The use of data is limited to the study being undertaken.
    • Identification of individuals is prohibited
    • Redistribution of data is prohibited.
  3. The data user must add the following citation while using the data in public (e.g. publishing an article).

    Kawai Y, Watanabe Y, Omae Y, Miyahara R, Khor S-S, Noiri E, et al. (2023) Exploring the genetic diversity of the Japanese population: Insights from a large-scale whole genome sequencing analysis. PLoS Genet 19(12): e1010625. https://doi.org/10.1371/journal.pgen.1010625.

Download VCF file created by the data provider [Unrestricted access]

You can download an original VCF file NCBN-freeze2.sampleQC.GTfilter.freq.vcf.gz (69.24 GB) from the NBDC human database.
NBDC Human DBStudy titleParticipantsSample
size
Data provider
hum0331.v1.freq.v1Construction of control data for the promotion of genomic medicine for cancers and rare diseaseshealthy individuals (Japanese)9,290Katsushi Tokunaga
Total9,290

List of populations for which frequencies are available

By using the “Alternative allele frequency/count” in the Advanced search, you can search for variants based on the alternative allele frequencies aggregated by population.