k-mer profile file format¶
The file format kPAL uses to store k-mer profiles is HDF5. Here we describe the structure within a k-mer profile file.
Versioning¶
The file format is versioned roughly according to semantic versioning. Software designed to work with files in version MAJOR.MINOR.PATCH should be able to work with files in later versions with the same MAJOR version without modification.
Current version: 1.0.0¶
The HDF5 toplevel attributes are:
- format (string) – This is always set to
kMer
. - version (string) – Currently
1.0.0
. - producer (string) – Anything, for example
My k-mer program 1.2.1
.
Each k-mer profile is a dataset under the /profiles
group, named
/profiles/<profile_name>
. The data is a one-dimensional array of integers
of length \(4^k\) (where \(k\) is the k-mer length) and is gzip
compressed. This dataset has the following attributes:
- length (integer): k-mer length (also know as k).
- total (integer): Sum of k-mer counts.
- non_zero (integer): Number of k-mers with a non-zero count.
- mean (float): Mean of k-mer counts.
- median (integer): Median of k-mer counts.
- std (float): Standard deviation of k-mer counts.
Within one file, all profiles must have the same value for the length attribute.
All strings and object names in the file are unicode strings encoded as described in the h5py documentation.
Changes from older versions¶
None yet.