What You Need

To predict the habitat preferences of individual prokaryotic species, just submit their whole or partial 16S ribosomal RNA/DNA sequence(s) in FASTA format. To calculate the habit preferences of prokaryotic communities, submit a simple tab-delimited OTU table (an example available below) along with 16S rRNA/rDNA sequences (i.e. OTU representative sequences).

While we provide a user-friendly interface below, raw ProkAtlas database and pipeline for calculating habitat preference scores, which requires only blast+, python, and pandas, are available here. Please also see our paper reporting the development and use cases of ProkAtlas.


How ProkAtlas Works

ProkAtlas contains 361,474 sequences, each labeled by one environmental category. Any query sequence of 16S rRNA gene is subjected to BLASTn search against ProkAtlas database, and "significant" hits satisfying the alignment criteria.

In many cases, BLASTn finds multiple hits across several environmental categories, and the composition of environmental categories within those hits are calculated. This composition, in principle, denotes the habitat preference of microbes corresponding to query sequence.

By compiling this composition for each community member, overall habitat preference scores representing a microbial community can be obtained.

Although we are trying to minimize the possible errors and biases in ProkAtlas, the results are always affected by "missing data errors". Because ProkAtlas does not cover all the microbes and samples on Earth, rare habitats may often dismissed by ProkAtlas.

More details are explained in our paper.



Run ProkAtlas

For Individual Species

Paste 16S rRNA sequence(s) in FASTA format: (Paste example)
- Note that the file size should be smaller than 5MB.
- Please do not include space in header lines (i.e. lines starting with ">").
- Please keep each header line 70 letters or shorter.
- Caveat: Like any other database search, the results are always affected by "missing data errors". In particular, habitat preferences of "rare" species are often biased and should be treated as such.

Or upload a FASTA file:


Parameter settings:
Sequence similarity threshold (%)
Alignment length threshold (bp)


Email address:
- Your results will be sent to the address here. We do not use your email address anywhere else.
- We accept using provisional (disposable) addresses.
- Results may not be promptly delivered to gmail accounts (possibly because of gmail's internal filtering). Please consider using other mailing services (May 30, 2022).


Run ProkAtlas

For Community Structures

Paste 16S rRNA sequence(s) in FASTA format: (Paste example)
- Note that the file size should be smaller than 5MB.
- Please do not include space in header lines (i.e. lines starting with ">")

Or upload a FASTA file:


Paste OTU table: (Paste example)
- Format of OTU table must comply that of the example file. -> Guideline for OTU table format
- OTU names and sequence names in the fasta file must be consistent
Or upload a tab-spaced file:


Parameter settings:
Sequence similarity threshold (%)
Alignment length threshold (bp)


Email address:
- Your results will be sent to the address here. We do not use your email address anywhere else.
- We accept using provisional (disposable) addresses.
- Results may not be promptly delivered to gmail accounts (possibly because of gmail's internal filtering). Please consider using other mailing services (May 30, 2022).


Update History

2020-08-30 Updated pipelines.
2020-06-28 Updated pipelines and database.
2019-10-01 Released ProkAtlas.

Contact