Using the Python library

kPAL provides a light-weight Python library for creating, analysing, and manipulating k-mer profiles. It is implemented on top of NumPy.

This is a gentle introduction to the library. Consult the API reference for more detailed documentation.

k-mer profiles

The class Profile is the central object in kPAL. It encapsulates k-mer counts and provides operations on them.

Instead of using the Profile constructor directly, you should generally use one of the profile construction methods. One of those is Profile.from_fasta(). The following code creates a 6-mer profile by counting from a FASTA file:

>>> from kpal.klib import Profile
>>> p = Profile.from_fasta(open('a.fasta'), 6)

The profile object has several properties. For example, we can ask for the k-mer length (also known as k), the total k-mer count, or the median count per k-mer:

>>> p.length
>>> p.median

Counts are stored as a NumPy ndarray of integers, one for each possible k-mer, in alphabetical order:

>>> len(p.counts)
>>> p.counts
array([ 8, 11,  5, ...,  7, 12, 13])

We can get the index in that array for a certain k-mer using the dna_to_binary() method:

>>> i = p.dna_to_binary('AATTAA')
>>> p.counts[i]

Storing k-mer profiles


Differences between k-mer profiles