An Assortment of Sorting Algorithms, Part 2

Last time, we have studied heap sort — a fast, in-place sorting algorithm. Today, we take a look at quick sort. As the name gives away, this is also a very fast sorting algorithm. While the worst case scales as O(N^2) with the size N of the input data, the expected scaling is just O(N \log N), just as good as heap sort. We will implement it in C++; the code, as always, can be found at the bottom of this post.

Quick Sort

The algorithm works as follows:

  1. Choose a pivot element.
  2. Loop through the array and ensure that all the values lower than the pivot are to the left of it, while the larger values are to the right of it.
  3. Recursively iterate on the left and right subarray.

Let’s address each point in turn. First, we need to choose the pivot element. This choice can be as simple as always choosing the last element of the (sub)array. However, what would happen if the original array is largely sorted? In that case, the chances are high the last element is already the largest. We then use it as the pivot, moving all smaller elements to the left of it — which they already are. The problem ensues as we split the problem for the recursion. The right subarray contains no elements, as the pivot is already the largest, and all other values now lie to the left of it. The left subarray is just one element shorter.

Now, consider what this means for the recursion. As each recursive call has a high chance of processing a subarray just one element shorter than the last, the number of recursive calls is linearly proportional to the number of elements. In the worst case, the pivot is always the largest value at every level, and thus every single recursion works with data only one element shorter. Hence there will be a total of N recursion levels. As the workload is linear in the number of elements N, this results in the worst case running time of O(N^2).

There are several ways to ameliorate this. One option is to choose a pivot at random. This drastically lowers the chance of choosing the largest element for pivoting even if the array is nearly sorted. Another solution would be to choose more than one element, find their median, and use that value as the pivot. Again, the chance that the median of several values is equal to the lowest value in the array is tiny for most intents and purposes.


Next, we adress the second point. How do we actually pivot the elements? Before we start, we move the pivot to the end of the array. The reason for this becomes apparent shortly. We introduce two indices. The first, i, will loop through the whole array from the left. The second, j, marks the position of the right-most element lower than the pivot. Whenever i encounters a value that is smaller than the pivot, the index j is increased (now pointing to the left-most value higher than the pivot), and the values at indices i and j are swapped. Once i reaches the end of the array, the index j is, logically, pointing to the last value that was swapped. Since we put the pivot at the end of the array, j is the index of the pivot, which we can return. We will call this procedure partition(). Maybe it will be easier to understand if its operation is illustrated on a practical example:

9 is larger than the pivot (8, indicated by the arrow), and thus nothing happens.
11 is again larger than the pivot, we move on.
7 is smaller than the pivot, thus gets colored dark grey. We advance the index j by one, and swap the elements in positions i and j.
14, again, is larger than the pivot.
4 is smaller than the pivot, thus we color it dark grey, advance index j, and swap.
9 is larger than the pivot, we continue…
6 is smaller, so we advance j and swap.
2 is also smaller, so we swap again…
… and finally, we advance index j one last time, and swap the pivot in its designated pocition.

Finally, as per the third step, we use the index of the pivot from the partition() procedure to split the array into two subarrays, and recursively apply our quick_sort() to both of them.

Implementation

We will implement quick_sort() like this:

template <class T>
void quick_sort(std::vector<T> & vec, int left, int right)
{
    if (left >= right)
    {
        return;
    }

    T pivot = select_pivot(vec, left, right);
    int q = partition(vec, pivot, left, right);
    
    quick_sort(vec, left, q - 1);
    quick_sort(vec, q + 1, right);
}

We first check whether our current (sub)array consists of just one element. In that case there’s nothing more to do and we can return. If there’s more than just one element, we select a value to pivot around using the function select_pivot(). However we choose to make the selection, it will be implemented here.

Next, we partition the array as described above and find the index of the pivot. Finally, we recursively sort the left and the right subarray. Pretty straightforward so far.


The procedure partition() can be implemented as:

template <class T>
void partition(std::vector<T> & vec, const T pivot, int left, int right)
{
    int j = left - 1;
    for (int i = left; i <= right; ++i)
    {
        if (vec[i] <= pivot)
        {
            ++j;
            std::swap(vec[i], vec[j]);
        }
    }
    return j;
}

We set the index j pointing to the left of the left-most array index. This looks dangerous, but keep in mind that in the following, we first increment j before using it to access any array elements. As discussed previously, we loop from left to right using index i. If the current value is larger than the pivot, j remains pointing at the right-most element lower than the pivot. Whenever we encounter a value lower than or equal to the pivot, however, index j is incremented to point to the first value larger than the pivot, and we swap the values at indices i and j. This way, j is now pointing to this smaller value, which is now located just to the right of the previously right-most value — in other words, all is well. At the very end of the loop, the pivot itself gets swapped, and the index j remains pointing to it. Hence we return j as the index of the pivot.


What about the choice of the pivot? The easiest way is to choose the last element:

template <class T>
T select_pivot_last(std::vector<T> & vec, int left, int right)
{
    return vec[right];
}

However, if the array is nearly sorted, this might lead to very unfavorable splitting of the array into subarrays. Too many of those, and our algorithm slows down considerably.

Hence, we may select a random element to serve as pivot:

#include <cstdlib>

template <class T>
T select_pivot_random(std::vector<T> & vec, int left, int right)
{
    int random = left + std::rand() * (right - left) / RAND_MAX;
    std::swap(vec[random], vec[right]);
    return vec[right];
}

This ensures that, while it’s still possible to get a really bad partitioning now and then, the overall partitions will generally be acceptable.

The last discussed choice could look like this:

template <class T>
T select_pivot_median(std::vector<T> & vec, int left, int right)
{
    int middle = (left + right) / 2;
    if (vec[left] > vec[middle])
        std::swap(vec[left], vec[middle]);
    if (vec[middle] > vec[right])
        std::swap(vec[middle], vec[right]);
    if (vec[left] > vec[right])
        std::swap(vec[left], vec[right]);

    std::swap(vec[middle], vec[right]); 
    return vec[right];
}

Here, we select the median of three values: the left-most, the middle, and the right-most elements of the array. We first sort these three values, then choose the middle one. Finally, we move it to the right-most position. This pivot selection is often called the median-of-three method. If there are k smallest values present in an array of length N, the chance of selecting one of them as pivot is \dfrac{k}{N} for the select_pivot_last() and select_pivot_random() procedures, but only \dfrac{k(k-1)}{N(N-1)} \sim \dfrac{k^2}{N^2} in the case of select_pivot_median().


Finally, we may want to define a wrapper function that would sort an array without the need of supplying the starting and final index. We can simply overload the quick_sort() function with a single argument:

template <class T>
void quick_sort(std::vector<T> & vec)
{
    int left = 0;
    int right = vec.size() - 1;
    quick_sort(vec, left, right);
}


In conclusion, quick sort, while quite slow in the worst case scenario, is rather fast in the general case. Investing a few instructions in better selecting the pivot can speed up the algorithm in pathological cases. Just like heap sort, quick sort sorts the array in place, meaning no extra memory, nor its costly allocation, is required.


For the sake of convenience, the header file with the quick sort implementation is reproduced below:

#include <cstdlib>
#include <algorithm>
#include <vector>

template <class T>
T select_pivot_last(std::vector<T> & vec, const int left, const int right)
{
    return vec[right];
}

template <class T>
T select_pivot_median(std::vector<T> & vec, const int left, const int right)
{
    int middle = (left + right) / 2;
    if (vec[left] > vec[middle])
        std::swap(vec[left], vec[middle]);
    if (vec[middle] > vec[right])
        std::swap(vec[middle], vec[right]);
    if (vec[left] > vec[right])
        std::swap(vec[left], vec[right]);

    std::swap(vec[middle], vec[right]);
    return vec[right];
}

// make sure the random number generator is initialized first
template <class T>
T select_pivot_random(std::vector<T> & vec, const int left, const int right)
{
    int random = left + std::rand() * (right - left) / RAND_MAX;
    std::swap(vec[random], vec[right]);
    return vec[right];
}

template <class T>
int partition(std::vector<T> & vec, const T pivot, const int left, const int right)
{
    int j = left - 1;
    for (int i = left; i <= right; ++i)
    {
        if (vec[i] <= pivot)
        {
            ++j;
            std::swap(vec[i], vec[j]);
        }
    }
    return j;
}

template <class T>
void quick_sort(std::vector<T> & vec, const int left, const int right)
{
    if (left >= right)
    {
        return;
    }

    T pivot = select_pivot_median(vec, left, right);
    //        select_pivot_random(vec, left, right);
    //        select_pivot_last(vec, left, right);
    int q = partition(vec, pivot, left, right);
    
    quick_sort(vec, left, q - 1);
    quick_sort(vec, q + 1, right);
}

template <class T>
void quick_sort(std::vector<T> & vec)
{
    int left = 0;
    int right = vec.size() - 1;
    quick_sort(vec, left, right);
}

An Assortment of Sorting Algorithms, Part 1

For many reasons, sorting algorithms are an integral topic of any computer science course. Firstly, they are immensely useful from a practical point of view — both directly, when trying to sort some data, and indirectly, by enabling or improving other, more advanced algorithms. Secondly, the wide variety of sorting algorithms is an ideal tool to understand critical concepts such as space and time complexity, O(N) notation, recursion, and algorithm analysis.

While modern programming languages and libraries take care of choosing the appropriate sorting algorithm for you, it is still necessary to have at least a basic understanding of what is happening ‘under the hood’. In this series of posts, we will explore a select few algorithms, and implement them in C++.

Heap Sort

The underlying principle of heap sort is the following:

  1. Identify the largest element of the (sub-)array.
  2. Move it to the very end.
  3. Exclude the last (now highest remaining) element by shortening the sub-array.
  4. Repeat.

This process hinges on how fast we can isolate the largest member of an array. If we can do it in constant time, then the whole algorithm takes place in O(N), as we simply carry out a constant number of operations at each iteration over the length N of the array. If, on the contrary, finding the largest element takes, say, linear time, we are stuck with an O(N^2) algorithm.

So, what is an efficient way to find the highest value in an array? Naively, one might assume that we would need to sort the array in the first place. This, however, is not required. Moving the largest value to the first element of the array is a weaker property, so to say, than ordering the whole array. In the former case, we don’t care what the actual order of all the other values is. So it should not come as a surprise that the former case can be accomplished in a more efficient manner than the latter (sorting the whole sub-array repeatedly).

The extraction of the largest value relies on a so-called max-heap. A max-heap is a special instance of a heap — a binary tree. The ‘max’ qualifier identifies a tree with the property that the value of every parent node is larger than the value of either of its two children. Naturally, the largest value in a max-heap will be located at the root node.

If we associate our sub-array to a heap, we can ensure that at each iteration the heap is a max-heap, and hence extract the largest value from the root. Luckily, it turns out that ‘max-heapifying’ a heap is a procedure that can be accomplished in logarithmic time! Therefore, the combination of such a procedure and a loop over the length of the original array leads to a O(N\log N) time cost! And that’s as good as it gets in the general case.

How it Works

First, let’s draw the correspondence between a heap and the array to be sorted:

Our array on the left, and the associated heap on the right. The heap is built by arranging the array elements consecutively into layers of 1, 2, 4, 8, … elements. Each parent has at most two children.

The above heap is clearly not a max-heap. How would we go about turning it into one? The property that needs to be satisfied is that each parent node holds a value larger than those of its two children. Hence, we could iterate over all nodes and check whether this is true. If not, we simply exchange the offending numbers. In fact, we don’t even need to iterate over all nodes, only those that have children, i.e. all with the exception of leaves. For every node, we check whether its left child and/or right child is larger. If yes, we exchange the values of the parent and of the larger of the two children. Furthermore, while the exchange will lead to our criterion being satisfied for that particular parent and child nodes, it might upset the max-heap property of the whole branch. Hence, if we exchange the values of a parent and a child, we should reiterate the procedure with the offending, now swapped, child.

For example, let’s consider the node number 1 with the value 11. We check whether its left child or its right child is larger. We find that the left child (node 3), indeed, is, having a value of 14. We swap the values of nodes 1 and 3. We then move down a level, and repeat this procedure for the offending node 3. Its children are nodes 7 and 8, both of which are smaller. As these are the leaves of the tree, we stop the recursion here. Let us call this procedure max_heapify().

Repeating this procedure for every non-leaf (internal) node will turn our heap into a max-heap. Each call to max_heapify() costs us at most two comparisons and one swap, a constant number of operations. If a swap does occur, and we need to recursively call max_heapify() on the child node every time, we are faced with at most \log N calls, as there are up to \log N levels in the heap. Thus, the cost of max_heapify() is O\left(\log N\right).

There are \left\lfloor\dfrac{N}{2}\right\rfloor internal nodes. Consider a complete binary tree such that every level is completely filled. As each level contains a number of nodes equal to a power of two, the last level (made up of all the leaves) will contain as many nodes as all previous levels combined, plus one. In other words, the number of leaves of a complete binary tree is \left\lceil\dfrac{N}{2}\right\rceil, and the number of internal nodes is \left\lfloor\dfrac{N}{2}\right\rfloor. Let’s consider a tree of 7 nodes. The above would lead to 3 internal nodes and 4 leaves. If we now add one more element (N = 8), we necessarily turn one of the leaves into an internal node, and \left\lfloor\dfrac{8}{2}\right\rfloor indeed is equal to 4. The number of leaves remains the same (we added one leaf, but we turned another leaf into an internal node). Adding yet another node should not increase the number of internal nodes, just the number of leaves. One can easily verify that the above expression holds true in this case as well.

All in all, to turn our heap into a max-heap, we need to call the O(\log N) procedure max_heapify() exactly \left\lfloor\dfrac{N}{2}\right\rfloor times, once for each internal node. If we call this the build_max_heap() procedure, we immediately see that its cost scales with the number of array elements as O( N \log N).


Once the array is reordered, upon calling build_max_heap(), in such a way as to correspond to a max-heap, we can proceed with the sorting as described previously:

We take the largest element and move it to the end of the array by swapping the first and the last element. We shorten the length of the heap by one:

Node 0 holds a value smaller than either of its children. Node 1 is the larger of the two, so we swap the values of node 0 and node 1. The greyed out array elements are already sorted and thus ignored.

and call the max_heapify() procedure on the first element, the only one possibly upsetting the max-heap property in the shortened heap:

We recursively follow down the branch, focusing on node 1 now. Node 3 is the larger of the children of node 1, a swap and a recursive call take place.
Node 3 only has one child now, and holds a smaller value, thus we swap them again.
We have reached the bottom of the heap, and have achieved the max-heap property. The largest value is held by the rood node.
We swap the last value of our current heap with the value in the root node, and shorten the heap by one,
… and so on and so forth …

until we are left with a heap of length one, at which point the array is necessarily sorted in ascending order. This procedure is hence the heap_sort() function.

Implementation

The implementation of heap_sort() will, therefore, look like this:

template <class T>
void heap_sort(std::vector & vec)
{
    build_max_heap(vec);
    int heap_size = vec.size();
    for (int i = vec.size() - 1; i > 0; --i)
    {
        std::swap(vec[i], vec[0]);
        --heap_size;
        max_heapify(vec, 0, heap_size);
    }
}

As discussed above, the heap_sort() first builds a max-heap via the build_max_heap() function. Then we iterate over each element starting from the end of the array. We swap the first array element (the largest value) with the element at the end of the heap. This done, we reduce the size of the heap, and call max_heapify() on the newly swapped element to ensure the max-heap property in the following iteration.

The build_max_heap() function simply loops over the first half of the array values and repeatedly calls max_heapify() on each element:

template <class T>
void build_max_heap(std::vector & vec)
{
    int heap_size = vec.size();
    for (int i = vec.size() / 2 - 1; i >= 0; --i)
    {
        max_heapify(vec, i, vec.size());
    }
}

All the magic is thus happening inside the max_heapify() routine:

template <class T>
void build_max_heap(std::vector & vec, int i, int heap_size)
{
    int l{ left(i) };
    int r{ right(i) };
    int largest{ 0 };

    if (l < heap_size && vec[l] > vec[i])
    {
        largest = l;
    }
    else
    {
        largest = i;
    }

    if (r < heap_size && vec[r] > vec[largest])
    {
        largest = r;
    }
    
    if (largest != i)
    {
        std::swap(vec[i], vec[largest]);
        max_heapify(vec, largest, heap_size);
    }
    return;
}

We first find the index of the node holding the largest value among the current node and its two children. If it’s not the current node that holds the largest value, we swap values with the node that does. Finally, as this swapping might upset the max-heap property of the child nodes, we recursively call max_heapify() on the relevant child node. The left() and right() procedures are just helper functions that return the index of the left and the right child node, respectively:

int left(int i) { return 2 * i + 1 }
int right(int i) { return 2 * i + 2 }


In conclusion, heap sort is a fast sorting algorithm that sorts an array in place (no extra memory required), and does so in O(N\log N) time. The implementation provided here assumes the array is in the form of a standard library vector, but it can trivially be generalized for other containers as well. Personally, what I find the most fascinating about heap sort is how a seemingly unrelated data structure (a heap, a binary tree) can be exploited to achieve such an elegant solution to the sorting problem.


For the sake of convenience, the whole header file is reproduced below:

#pragma once
#include <algorithm>
#include <vector>

int left(int i) { return 2 * i + 1 }
int right(int i) { return 2 * i + 2 }

template <class T>
void build_max_heap(std::vector & vec, int i, int heap_size)
{
    int l{ left(i) };
    int r{ right(i) };
    int largest{ 0 };

    if (l < heap_size && vec[l] > vec[i])
    {
        largest = l;
    }
    else
    {
        largest = i;
    }

    if (r < heap_size && vec[r] > vec[largest])
    {
        largest = r;
    }
    
    if (largest != i)
    {
        std::swap(vec[i], vec[largest]);
        max_heapify(vec, largest, heap_size);
    }
    return;
}

template <class T>
void build_max_heap(std::vector & vec)
{
    int heap_size = vec.size();
    for (int i = vec.size() / 2 - 1; i >= 0; --i)
    {
        max_heapify(vec, i, vec.size());
    }
}

template <class T>
void heap_sort(std::vector & vec)
{
    build_max_heap(vec);
    int heap_size = vec.size();
    for (int i = vec.size() - 1; i > 0; --i)
    {
        std::swap(vec[i], vec[0]);
        --heap_size;
        max_heapify(vec, 0, heap_size);
    }
}

Quantum Tunneling, Part 3

Last time, we solved the problem of a rectangular potential barrier by deriving the tunneling probability of a quantum particle. Today, we will attempt to solve the problem numerically by simulating a particle hitting a potential barrier and, hopefully, tunneling through it.


One thing to note: the analytic solution from last time was that of a time-independent problem. However, writing a simulation to model the same case is, frankly, boring. Instead, let us simulate the time dependent problem represented by the Schrödinger equation:

H\psi(x, t) = i\hbar\dfrac{\partial \psi(x, t)}{\partial t}

Moreover, instead of having the wave function \psi(x, t) represent a plane wave solution, we will look at the time evolution of a localized Gaussian wave packet.

We start by importing several packages. In particular, we will require the sparse package to represent sparse matrices, and the sparse.linalg module to manipulate them:

import matplotlib
matplotlib.use('Qt4Agg')
import matplotlib.pyplot as plt 
import scipy as sp
from scipy.sparse import linalg as ln
from scipy import sparse as sparse
import matplotlib.animation as animation

Next, we create the Wave_Packet class. It will require multiple parameters to instantiate:

class Wave_Packet:
    def __init__(self, n_points, dt, sigma0=5.0, k0=1.0, x0=-150.0, x_begin=-200.0,
                 x_end=200.0, barrier_height=1.0, barrier_width=3.0):

        self.n_points = n_points
        self.sigma0 = sigma0
        self.k0 = k0
        self.x0 = x0
        self.dt = dt
        self.prob = sp.zeros(n_points)
        self.barrier_width = barrier_width
        self.barrier_height = barrier_height
        
        """ 1) Space discretization """
        self.x, self.dx = sp.linspace(x_begin, x_end, n_points, retstep=True)        

        """ 2) Initialization of the wave function to Gaussian wave packet """
        norm = (2.0 * sp.pi * sigma0**2)**(-0.25)
        self.psi = sp.exp(-(self.x - x0)**2 / (4.0 * sigma0**2))
        self.psi *= sp.exp(1.0j * k0 * self.x)
        self.psi *= (2.0 * sp.pi * sigma0**2)**(-0.25)

        """ 3) Setting up the potential barrier """
        self.potential = sp.array(
            [barrier_height if 0.0 < x < barrier_width else 0.0 for x in self.x])

        """ 4) Creating the Hamiltonian """
        h_diag = sp.ones(n_points) / self.dx**2 + self.potential
        h_non_diag = sp.ones(n_points - 1) * (-0.5 / self.dx**2)
        hamiltonian = sparse.diags([h_diag, h_non_diag, h_non_diag], [0, 1, -1])
        
        """ 5) Computing the Crank-Nicolson time evolution matrix """
        implicit = (sparse.eye(self.n_points) - dt / 2.0j * hamiltonian).tocsc()
        explicit = (sparse.eye(self.n_points) + dt / 2.0j * hamiltonian).tocsc() 
        self.evolution_matrix = ln.inv(implicit).dot(explicit).tocsr()

We then carry on as follows:

  1. Discretize space using a uniformly spaced grid x of step dx.
  2. Initialize the wave function as a Gaussian packet. We first prepare a Gaussian distribution of deviation sigma0 centered at x0, which we then multiply by a complex exponential representing the initial momentum k0, and finally normalize:

\psi(x) = \frac{1}{\sqrt[4]{2 \pi \sigma^2}}e^{-\left(\frac{x - x_0}{2 \sigma}\right)^2} e^{i k_0 x}

  1. Set up the potential as a zero array, except for the barrier which is located just to the right of the origin of the coordinate system.
  2. Calculate the Hamiltonian matrix. Here we adopt the atomic unit system, in which the Hamiltonian assumes the simple form:

H = -\dfrac{1}{2}\Delta + V(x)

The action of the Laplace operator on a generic function f can be discretized according to the finite difference scheme:

\Delta f \approx \dfrac{f[x-\text{d}x] - 2f[x] + f[x + \text{d}x]}{\text{d}x^2}

and hence the Laplace operator itself may be represented by the finite difference stencil \left[1,\ -2,\ 1\right]. Taking the potential into account as well, the discretized Hamiltonian takes the form of the matrix:

\dfrac{1}{2\text{d}x^2} \begin{pmatrix} \ \ 2 & -1 & \ & \ \\ -1 &\ \ 2 & -1 & \ \\ \ & -1 &\ \ 2 & -1 & \ \\ \ & \ & \ & \ddots \end{pmatrix} +   \begin{pmatrix} \ \ V[0] & \ & \ & \ \\ \ & V[1] & \ & \ \\ \ & \ & V[2] & \ & \ \\ \ & \ & \ & \ddots \end{pmatrix}

In the script, we first create the arrays representing the diagonal and off-diagonal elements. Then, we use the sparse.diag() function to build a sparse matrix by specifying the diagonal elements and their respective distance from the main diagonal:

class Wave_Packet:
    ...
        """ 4) Creating the Hamiltonian """
        h_diag = sp.ones(n_points) / self.dx**2 + self.potential
        h_non_diag = sp.ones(n_points - 1) * (-0.5 / self.dx**2)
        hamiltonian = sparse.diags([h_diag, h_non_diag, h_non_diag], [0, 1, -1])
    ...

  1. Finally, compute the time evolution operator. Given our differential equation:

H\psi(x, t) = i\dfrac{\partial \psi(x, t)}{\partial t}

we can discretize it in time as:

\dfrac{\psi(x, t + \text{d}t) -  \psi(x, t) }{\text{d}t}  \approx -i\,H\psi(x, t)

This is called the forward Euler method, or the explicit method, as we can easily express the value of the desired function at the next time step from its value at the current step. Likewise, we can carry out a backward Euler method by imposing:

\dfrac{\psi(x, t + \text{d}t) -  \psi(x, t) }{\text{d}t}  \approx -i\,H\psi(x, t + \text{d}t)

This, instead, is an implicit formula — the value of the function at the next time step is present on both sides of the equation.

Combining these two methods yields the so-called Crank-Nicolson scheme, which combines adequate accuracy and stability, and incurs a relatively modest computational cost:

\dfrac{\psi(x, t + \text{d}t) -  \psi(x, t) }{\text{d}t}  \approx \dfrac{1}{2i}\,H\psi(x, t) +  \dfrac{1}{2i}\,H\psi(x, t + \text{d}t)

\psi(x, t + \text{d}t) -  \psi(x, t)   = \dfrac{ \text{d}t }{2i}\,H\psi(x, t) +  \dfrac{ \text{d}t }{2i}\,H\psi(x, t + \text{d}t)

\left(1 -   \dfrac{ \text{d}t }{2i}\,H \right)   \psi(x, t + \text{d}t)  =  \left(1 + \dfrac{ \text{d}t }{2i}\,H\right)  \psi(x, t)

Isolating the value of \psi(x, t + \text{d}t) we eventually get:

\psi(x, t + \text{d}t) = \left(1 -   \dfrac{ \text{d}t }{2i}\,H \right)^{-1}   \left(1 + \dfrac{ \text{d}t }{2i}\,H\right)  \psi(x, t)

Thus, we create the implicit and explicit matrices. According to the documentation, the matrix to be inverted should be in the compressed sparse column (csc) format for maximum performance, hence we convert both matrices to the csc representation. However, matrix-vector multiplication is faster when the matrix is in the compressed sparse row (csr) format:

class Wave_Packet:
    ...
        """ 5) Computing the Crank-Nicolson time evolution matrix """
        implicit = (sparse.eye(self.n_points) - dt / 2.0j * hamiltonian).tocsc()
        explicit = (sparse.eye(self.n_points) + dt / 2.0j * hamiltonian).tocsc() 
        self.evolution_matrix = ln.inv(implicit).dot(explicit).tocsr()


Now that we set everything up nicely, the method evolve(), which takes care of advancing the state of the system one step ahead in time, can be quite simple and straightforward:

class Wave_Packet:
    ...
    def evolve(self):
        self.psi = self.evolution_matrix.dot(self.psi)
        self.prob = abs(self.psi)**2

        norm = sum(self.prob)
        self.prob /= norm
        self.psi /= norm**0.5

        return self.prob

We first compute the wave function at the next time step by applying the time evolution matrix to the previous wave function. What we are interested in, however, is the probability density — modulus square of the wave function. Before we return the density, we normalize both quantities.

Animation

As we have done something very similar in a recent post, we will move quickly here. In the constructor, apart from setting up the figure, we plot the potential barrier (its height reduced, just for a nicer visual presentation), as well as display a timer:

class Animator:
    def __init__(self, wave_packet):
        self.time = 0.0
        self.wave_packet = wave_packet
        self.fig, self.ax = plt.subplots()
        plt.plot(self.wave_packet.x, self.wave_packet.potential * 0.1, color='r')
        
        self.time_text = self.ax.text(0.05, 0.95, '', horizontalalignment='left',
            verticalalignment='top', transform=self.ax.transAxes)
        self.line, = self.ax.plot(self.wave_packet.x, self.wave_packet.evolve())
        self.ax.set_ylim(0, 0.2)
        self.ax.set_xlabel('Position (a$_0$)')
        self.ax.set_ylabel('Probability density (a$_0$)')

    ...

The update(data) method simply updates the plot of the probability density. The actual data to be plotted at each time step is computed in the time_step() method. The return value of the wave_packet is yielded at each call of the method. We also take car of updating the time display. Note that we are working in atomic units in general, but convert to SI here for the sake of clarity. Finally, the animate() method takes care of creating the animation:

class Animator:
    ...
    def update(self, data):
        self.line.set_ydata(data)
        return self.line,
    
    def time_step(self):
        while True:
            self.time += self.wave_packet.dt
            self.time_text.set_text(
                'Elapsed time: {:6.2f} fs'.format(self.time * 2.419e-2))
          
            yield self.wave_packet.evolve()
    
    def animate(self):
        self.ani = animation.FuncAnimation(
            self.fig, self.update, self.time_step, interval=5, blit=False)

Results and Discussion

Let’s set up a quantum particle with a Gaussian profile on a collision course with a potential barrier. What will happen?

Part of the particle appears to tunnel through the barrier! If we were to place a detector behind the barrier, it might measure the particle appearing behind it — although the probability of that will be quite a bit lower compared to the chance of finding the particle in front of the barrier, where we classically expect it to be.

The system behaves as we hoped it would — qualitatively, that is. But does the simulation agree with our predictions quantitatively as well? Let us integrate the probability density on both sides of the barrier at each time step. What we observe is, naturally, that the probability of the particle being on the left of the barrier starts out equal to one, but after a few femtoseconds the particle tunnels through and we have a non-zero probability to observe it on the right of the barrier:

The ratio of the probability to the right and to the left of the barrier gives us the tunneling probability — or transmission coefficient, as we called it last time:

T = \dfrac{1}{1 + \dfrac{V_0^2}{4E(V_0 - E)}\sinh^2\left(\dfrac{\sqrt{2m(V_0 - E)}a}{\hbar}\right)}

To verify a possible quantitative agreement of our numerical model with the above equation, I’ve run several simulations with different barrier widths and heights:

Tunneling probability as a function of barrier width (top) and height (bottom). The points represent simulation results, the full line corresponds to the analytic solution.

The agreement is not perfect. In all honesty, I had to play around a bit with some of the parameters in the analytic solution to better match the curve to the data points. However, a perfect correlation is not expected — the analytic solution was derived for plane waves, whereas here we simulate a wave packet. In any case, we observe what seems like the correct behaviour: the thicker and higher the barrier, the lower the probability of the particle tunneling through it.

This is the last post in the current series. Until next time!


For the sake of convenience, you can find the whole script below:

import matplotlib
matplotlib.use('Qt4Agg')
import matplotlib.pyplot as plt 
import scipy as sp
from scipy.sparse import linalg as ln
from scipy import sparse as sparse
import matplotlib.animation as animation

class Wave_Packet:
    def __init__(self, n_points, dt, sigma0=5.0, k0=1.0, x0=-150.0, x_begin=-200.0,
                 x_end=200.0, barrier_height=1.0, barrier_width=3.0):

        self.n_points = n_points
        self.sigma0 = sigma0
        self.k0 = k0
        self.x0 = x0
        self.dt = dt
        self.prob = sp.zeros(n_points)
        self.barrier_width = barrier_width
        self.barrier_height = barrier_height
        
        """ 1) Space discretization """
        self.x, self.dx = sp.linspace(x_begin, x_end, n_points, retstep=True)        

        """ 2) Initialization of the wave function to Gaussian wave packet """
        norm = (2.0 * sp.pi * sigma0**2)**(-0.25)
        self.psi = sp.exp(-(self.x - x0)**2 / (4.0 * sigma0**2))
        self.psi *= sp.exp(1.0j * k0 * self.x)
        self.psi *= (2.0 * sp.pi * sigma0**2)**(-0.25)

        """ 3) Setting up the potential barrier """
        self.potential = sp.array(
            [barrier_height if 0.0 < x < barrier_width else 0.0 for x in self.x])

        """ 4) Creating the Hamiltonian """
        h_diag = sp.ones(n_points) / self.dx**2 + self.potential
        h_non_diag = sp.ones(n_points - 1) * (-0.5 / self.dx**2)
        hamiltonian = sparse.diags([h_diag, h_non_diag, h_non_diag], [0, 1, -1])
        
        """ 5) Computing the Crank-Nicolson time evolution matrix """
        implicit = (sparse.eye(self.n_points) - dt / 2.0j * hamiltonian).tocsc()
        explicit = (sparse.eye(self.n_points) + dt / 2.0j * hamiltonian).tocsc() 
        self.evolution_matrix = ln.inv(implicit).dot(explicit).tocsr()

    def evolve(self):
        self.psi = self.evolution_matrix.dot(self.psi)
        self.prob = abs(self.psi)**2

        norm = sum(self.prob)
        self.prob /= norm
        self.psi /= norm**0.5

        return self.prob

class Animator:
    def __init__(self, wave_packet):
        self.time = 0.0
        self.wave_packet = wave_packet
        self.fig, self.ax = plt.subplots()
        plt.plot(self.wave_packet.x, self.wave_packet.potential * 0.1, color='r')
        
        self.time_text = self.ax.text(0.05, 0.95, '', horizontalalignment='left',
            verticalalignment='top', transform=self.ax.transAxes)
        self.line, = self.ax.plot(self.wave_packet.x, self.wave_packet.evolve())
        self.ax.set_ylim(0, 0.2)
        self.ax.set_xlabel('Position (a$_0$)')
        self.ax.set_ylabel('Probability density (a$_0$)')

    def update(self, data):
        self.line.set_ydata(data)
        return self.line,
    
    def time_step(self):
        while True:
            self.time += self.wave_packet.dt
            self.time_text.set_text(
                'Elapsed time: {:6.2f} fs'.format(self.time * 2.419e-2))
          
            yield self.wave_packet.evolve()
    
    def animate(self):
        self.ani = animation.FuncAnimation(
            self.fig, self.update, self.time_step, interval=5, blit=False)


wave_packet = WavePacket(n_points=500, dt=0.5, barrier_width=10, barrier_height=1)
animator = Animator(wave_packet)
animator.animate()
plt.show()

Quantum Tunneling, Part 2

In the last installment, we discussed quantum mechanics, set up the problem of a rectangular potential barrier, and used boundary conditions to derive a system of equations that we need to solve to find the tunneling probability. Today, we will proceed with the solution of this set of equations.


To recapitulate, the system of equations to be solved is the following:

\begin{aligned}1 + r &= B_r + B_l \\ i k \left(1 - r\right) &= iq\left(B_r - B_l\right) \\ B_r e^{i q a} + B_l e^{-i q a} &= t e^{i k a} \\ iq\left(B_r e^{iqa} - B_l e^{-iqa}\right)&= ikt e^{ika} \end{aligned}

\begin{aligned}&(1)\\&(2)\\&(3)\\&(4)\end{aligned}

Let’s start by eliminating the parameters B_r and B_l. We can isolate B_r in the first equation (1):

1 + r = B_r + B_l \quad \Rightarrow \quad B_r = (1 + r) - B_l

We then plug this into the second equation (2):

i k \left(1 - r\right) = iq\left[(1 + r) - B_l - B_l\right]

i k \left(1 - r\right) = iq\left(1 + r - 2B_l\right)

i k \left(1 - r\right) - iq\left(1 + r\right) = -2iqB_l

B_l = - \dfrac{k}{2q} \left(1 - r\right) + \dfrac{1}{2}\left(1 + r\right)

We can now use this to find B_r:

B_r \equiv (1 + r) - B_l = 1 + r + \dfrac{k}{2q} \left(1 - r\right) - \dfrac{1}{2}\left(1 + r\right)

B_r = \dfrac{k}{2q} \left(1 - r\right) + \dfrac{1}{2}\left(1 + r\right)


Since we have “used up” the first two equations, only the latter two are left:

\begin{aligned} B_r e^{i q a} + B_l e^{-i q a} &= t e^{i k a} \\ iq\left(B_r e^{iqa} - B_l e^{-iqa}\right)&= ikt e^{ika} \end{aligned}

\begin{aligned}&(3)\\&(4)\end{aligned}

Plugging in for B_r and B_l in eq. (3) we have that:

\left[\dfrac{k}{2q} \left(1 - r\right) + \dfrac{1}{2}\left(1 + r\right) \right]e^{i q a} + \left[-\dfrac{k}{2q} \left(1 - r\right) + \dfrac{1}{2}\left(1 + r\right)\right] e^{-i q a} = t e^{i k a}

\left[k \left(1 - r\right) + q\left(1 + r\right) \right]e^{i q a} + \left[-k \left(1 - r\right) + q\left(1 + r\right)\right] e^{-i q a} = 2qt e^{i k a}

where we just multiplied both sides by a factor of 2q.

We can do something similar with eq. (4):

iq\left[\dfrac{k}{2q} \left(1 - r\right) + \dfrac{1}{2}\left(1 + r\right) \right]e^{i q a} - iq\left[-\dfrac{k}{2q} \left(1 - r\right) + \dfrac{1}{2}\left(1 + r\right)\right] e^{-i q a} = i k t e^{i k a}

Multiplying by 2 and dividing by i we get:

\left[k\left(1 - r\right) + q\left(1 + r\right) \right]e^{i q a} - \left[-k\left(1 - r\right) +q\left(1 + r\right)\right] e^{-i q a} = 2 k t e^{i k a}

It’s immediately clear that the terms on the left hand side are identical between our two equations. It seems but natural to add both equations together (3) + (4) to get rid of the second terms on the left hand side:

2\left[k\left(1 - r\right) + q\left(1 + r\right) \right]e^{i q a}  = 2 k t e^{i k a} + 2qte^{ika}

\left(k - kr + q + qr \right)e^{i q a}  = (k + q) t e^{i k a}

\left[(q + k) + r(q - k) \right]e^{i (q-k) a}  = (q + k) t

(q + k) + r(q - k) = (q + k) t e^{-i (q - k) a}

r(q - k) = (q + k) t e^{-i (q - k) a}  - (q + k)

r = \dfrac{(q + k)}{(q - k)}\left[ t e^{-i (q - k) a} - 1\right]

At the same time, let us subtract eq. (4) from eq. (3):

2\left[-k\left(1 - r\right) + q\left(1 + r\right) \right]e^{-i q a}  = 2 q t e^{i k a} - 2 k t e^{i k a}

\left(-k + kr + q + qr \right)e^{-i q a}  = (q - k) t e^{i k a}

\left[(q - k) + r(q + k)\right]e^{-i (q + k) a}  = (q - k) t

(q - k) + r(q + k) = (q - k) t e^{i (q + k) a}

r(q + k) = (q - k) t e^{i (q + k) a} - (q - k)

r = \dfrac{(q - k)}{(q + k)}\left[ t e^{i (q + k) a} - 1\right]

We have two expressions for the coefficient of reflection r. Equating them we eliminate r and find the transmission coefficient t:

\dfrac{(q + k)}{(q - k)}\left[ t e^{-i (q - k) a} - 1\right] = \dfrac{(q - k)}{(q + k)}\left[ t e^{i (q + k) a} - 1\right]

Multiplying both sides by (q - k) (q + k):

(q + k)^2 \left[ t e^{-i (q - k) a} - 1\right] = (q - k)^2\left[ t e^{i (q + k) a} - 1\right]

and isolating t:

t\left[ (q + k)^2 e^{-i (q - k) a} - (q - k)^2 e^{i (q + k) a}\right] = (q + k)^2 - (q - k)^2

We can expand the expression on the right hand side, and multiply both sides by e^{i(q - k) a}:

t\left[ (q + k)^2 - (q - k)^2 e^{2 i q a}\right] = \left(q^2 + 2kq + k^2 - q^2 + 2kq - k^2\right)e^{i (q - k) a}

Fortunately, some terms on the right cancel out, and solving for t we find:

t = \dfrac{4kqe^{i(q - k) a}}{(q + k)^2 - (q - k)^2 e^{2 i q a}}

Calculating the Tunneling Probability

The probability that the particle tunnels through the barrier is equal to the modulus squared of the transmission coefficient T \equiv |t|^2. First, let us note that we care about the solution for a particle of energy E lower than the potential barrier height V_0. In such a case, we have that:

k = \sqrt{\dfrac{2mE}{\hbar^2}}\quad\text{and}\quad q = \sqrt{\dfrac{2m(E - V_0)}{\hbar^2}} = i \sqrt{\dfrac{2m(V_0 - E)}{\hbar^2}} \equiv iq'

Let us, therefore, replace q in the expression for the transmission coefficient t with iq' as seen above. And then, for the sake of convenience, we will relabel q' with q, keeping in mind that q is now a real number:

t = \dfrac{4ikqe^{-i k a}e^{-q a}}{(k + iq)^2 - (iq - k)^2 e^{-2q a}}

The tunneling probability is then calculated as:

T = \dfrac{4ikqe^{-i k a}e^{-q a}}{(k + iq)^2 - (iq - k)^2 e^{-2q a}}\cdot \dfrac{-4ikqe^{i k a}e^{q a}}{(k - iq)^2 - (-iq - k)^2 e^{-2q a}}

T = \dfrac{16k^2q^2e^{-2q a}}{\left[\left(k^2 + 2ikq - q^2\right) - \left(k^2 - 2ikq - q^2\right) e^{-2qa}\right]\cdot\left[\left(k^2 - 2ikq -q^2\right) - \left(k^2 + 2ikq - q^2\right) e^{-2q a}\right]}

By isolating some of the terms in the denominator we get:

T = \dfrac{16k^2q^2e^{-2q a}}{\left[\left(k^2 - q^2\right)\left(1 - e^{-2qa}\right) + 2ikq\left(1 + e^{-2qa}\right)\right]\cdot\left[\left(k^2 - q^2\right)\left(1 - e^{-2qa}\right) - 2ikq\left(1 + e^{-2qa}\right)\right]}

We now use the fact that for any real a,\, b:\ (a + ib)\cdot(a - ib) = a^2 + b^2 to find:

T = \dfrac{16k^2q^2e^{-2q a}}{\left(k^2 - q^2\right)^2\left(1 - e^{-2qa}\right)^2 + 4k^2q^2\left(1 + e^{-2qa}\right)^2}

Let us multiply both the numerator and the denominator by e^{2qa}:

T = \dfrac{16k^2q^2}{\left(k^2 - q^2\right)^2\left(e^{qa} - e^{-qa}\right)^2 + 4k^2q^2\left(e^{qa} + e^{-qa}\right)^2}

In the denominator we can use the identities for hyperbolic cosines and sines, e^{x} + e^{-x}\equiv 2\cosh{x} and e^{x} - e^{-x}\equiv 2\sinh{x}, respectively:

T = \dfrac{16k^2q^2}{\left(k^2 - q^2\right)^2\,4\sinh^2(qa) + 4k^2q^2\,4\cosh^2(qa)}

T = \dfrac{16k^2q^2}{4\left(k^4 - 2k^2q^2 + q^4\right)\sinh^2(qa) + 16k^2q^2\cosh^2(qa)}

Rearranging the terms:

T = \dfrac{16k^2q^2}{4\left(k^4 + q^4\right)\sinh^2(qa) - 8k^2q^2\sinh^2(qa) + 16k^2q^2\cosh^2(qa)}

we can use the identity \cosh^2(x) - \sinh^2(x) \equiv 1 to rewrite the last term of the denominator:

T = \dfrac{16k^2q^2}{4\left(k^4 + q^4\right)\sinh^2(qa) - 8k^2q^2\sinh^2(qa) + 16k^2q^2\left[1 + \sinh^2(qa)\right]}

T = \dfrac{16k^2q^2}{4\left(k^4 + q^4\right)\sinh^2(qa) + 16k^2q^2 + 8k^2q^2\sinh^2(qa)}

We now merge the last term with the other terms containing the hyperbolic sine:

T = \dfrac{16k^2q^2}{4\left(k^4 +2k^2q^2 + q^4\right)\sinh^2(qa) + 16k^2q^2 }

T = \dfrac{16k^2q^2}{4\left(k^2 +q^2\right)^2\sinh^2(qa) + 16k^2q^2 }

and finally dividing by 16k^2 q^2 we find:

T = \dfrac{1}{\dfrac{\left(k^2 + q^2\right)^2}{4k^2q^2}\sinh^2(qa) + 1 }

Plugging in for k and q:

k = \sqrt{\dfrac{2mE}{\hbar^2}}\quad\text{and}\quad q = \sqrt{\dfrac{2m(V_0 - E)}{\hbar^2}}

we have that:

\dfrac{\left(k^2 + q^2\right)^2}{4k^2q^2} = \dfrac{\left[E + (V_0 - E)\right]^2}{4E(V_0 - E)} = \dfrac{V_0^2}{4E(V_0 - E)}

and hence:

T = \dfrac{1}{1 + \dfrac{V_0^2}{4E(V_0 - E)}\sinh^2\left(\dfrac{\sqrt{2m(V_0 - E)}a}{\hbar}\right)}


That was a lot of LaTeX. Fortunately, this is it — we have found an expression that gives the tunneling probability as a function of three readily available parameters: the energy of the particle E, the height of the potential barrier V_0, and its thickness a.

In the following part we will write a Python program that simulates a particle hitting the potential barrier. Hopefully, the expression derived above can successfully predict the probability of it tunneling through!

Until next time!

Visualizing .xyz Structures in Blender

Whether in a paper, on a poster, or during a presentation — the visual representation of the structures we study is critical. A fine balance must be achieved between impact and clarity. Depending on the occasion, a choice has to be made between a representation that communicates the atomic configuration in a clear and succint manner; and one that is rather more intriguing and visually striking. But in both cases the final figure should be as pleasing to the eye as possible.

There is a multitude of tools to manipulate, inspect, and render molecules and atomic structures. But as with most tools, the best ones are highly specialized, and probably can not singlehandedly achieve all of the above at the desirable level of quality. Today, we will use Python to render a configuration of atoms in Blender.


For a reason that escapes me, many scientists disregard anything to do with science communication. Some, even, find a sort of perverse pride in outright shunning any attempt at communicating their science, visually or otherwise. As if spending too much effort on making their graphs understandable, their figures attractive, their writing engaging, and their articles accessible might somehow diminish, or dilute, the underlying science. Why that is so is beyond me. But if you attended a few conferences, or read a few papers, you would have seen figures like the following:

Bumpy spheres, cheap plastic look, default background and lighting, useless axes floating around, and even some clipping artifacts at the forefront. In one word: ugly.

What the above tries to represent is a nickel oxide layer with an iron oxyhydroxide nanoparticle on top. And granted, with an appropriate caption the information is readily accessible. However, the rendering is dismal — just plain ugly.

I have used VMD (Visual Molecular Dynamics) to render the graphic. While I greatly enjoy using this application, here I went with the default settings and rendered directly the user’s view, which explains the low quality of the end result. In practice, however, I play around with the settings, lighting, colors, and materials to make the structure clear, and hopefully, pleasing to the beholder’s eye as well:

The same structure viewed in an orthographic projection. The bright background, use of vivid colors, and outlines of the atoms make for a clear and accessible presentation of the system.

Personally, I prefer the above orthographic projection, as I find it communicates periodic structures much better than a perspective one. Nevertheless, let’s assume we want a figure for a short conference presentation, or to be featured as a TOC graphic in a publication. This would tip the balance between clarity and visual impact in the favor of the latter. The task is then to create the most appealing representation our limited time allows.

As intimated previously, we can only go so far with a single application or program. While VMD is phenomenal in inspecting structures and trajectories, we are much better off using a dedicated rendering program to render our figure. Cue Blender, a free and open 3D rendering software.

Interfacing Blender and Python

Instead of importing some Blender package into Python and using it to render things in Python, we must open Blender and use a Python script inside of it. Kind of the other way around, but that’s just how it is.

Start up Blender through the console / command prompt. This is important, since any Python errors will be displayed in the console only. Open a new Blender project. Split the view by clicking and dragging the top right corner of the main window to the left. Then, click on the small cube icon on the bottom left of the new panel and choose the Text Editor. You should see a blank, gray document:

You can switch on syntax highlighting by pressing the rightmost of the three icons situated on the same horizontal bar at the bottom of the newly created panel.

Next, import bpy, the package that allows you to interface with Blender. If you didn’t notice yet, howering your cursor above nearly anything will display the appropriate Python function, method, or module in bpy. Now run the (empty) script with alt + p!

Drawing an .xyz Structure

As we’ve discussed in the post Radial Distribution Function, the .xyz file format is a human-readable representation of a configuration of atoms. The first line holds the number of atoms; the second one is reserved for comments and is ignored; the rest of the lines each hold the symbol of the element and the cartesian coordinates of the atom (in angstrom). We will write a script that reads in the coordinates and draws a sphere centered at each atom’s position. We will also find and draw bonds between nearby atoms.

import bpy
import numpy as np

sizes = { 'Ni' : 0.7, 'Fe' : 0.7, 'O' : 0.5, 'H' : 0.2 }
colors = { 'Ni' : (0.0, 0.0, 0.7), 'Fe' : (0.4, 0.4, 0.7),
            'O' : (1.0, 0.0, 0.0),  'H' : (1.0, 1.0, 1.0), 
            'bond': (0.05, 0.05, 0.05) }
            
for key in colors.keys():
    bpy.data.materials.new(name=key)
    bpy.data.materials[key].diffuse_color = colors[key]
    bpy.data.materials[key].specular_intensity = 0.0

def distance(a, b):
    return np.sqrt(np.dot(a - b, a - b))

def normalize(a):
    return np.array(a) / np.sqrt(np.dot(a, a))

We first create two dictionaries, colors and sizes, which define the color (in RGB) and size (in angstrom), respectively, of the balls representing each element. I choose red for oxygen and white for hydrogen — by convention. The nickel will be dark blue and the iron a sort of steel gray. Next, we create a material for each element with the appropriate color. I put the specular factor to zero, as I am aiming for a more cartoony, dull, non-shiny look. While we are at it, we define two small functions: one to compute the distance of two points represented by numpy arrays; the other to normalize a vector, in the same format.

Next, we will create a class to represent our structure. The only initialization parameter is going to be the path of the .xyz file we want to visualize:

class Structure:
    def __init__(self, filename):
        with open(filename, 'r') as f:
            data = f.readlines()
        
        self.n_atoms = int(data[0].split()[0])
        self.atom_list = [line.split()[0] for line in data[2:]]
        
        coords = []
        for i, line in enumerate(data[2:]):
            coords.append([float(value) for value in line.split()[1:4]])
        
        self.coordinates = np.array(coords)
        self.bonds = set()

    ...

The parsing of the .xyz file follows the same logic laid out in the RDF post. Now that we have read the coordinates of the atoms, we can proceed to identify bonds between them. We adopt here a very basic, purely distance-based approach. As the user, we can specify the list of elements we want to evaluate for bonding, and the cutoff distance for these bonds. For example, bonds between oxygen and hydrogen are quite short, on the order of one angstrom. In contrast, some metal-metal bonds may be two, or even more, angstrom in length.

The bonds are to be represented as a set of tuples holding the indices of the atoms connected by the bond. We use a set because: (a) no bond, once created, is expected to change; (b) each bond should be unique; (c) the order of bonds does not matter. A Python set is the ideal data structure for this job.

Let us now assess bonds. As discussed, we take two arguments: a list of elements, and a distance cutoff. We then carry out a double loop, looping over every pair of atoms. If both atoms are on the list of elements, and their distance is below the cutoff, a bond is added to the set. Quite straightforward:

class Structure:
    ...
    def add_bonds(self, atom_types, cutoff):
        for i in range(self.n_atoms - 1):
            position_1 = self.coordinates[i]
            element_1 = self.atom_list[i]
            
            if not element_1 in atom_types:
                continue
            
            for j in range(i + 1, self.n_atoms):
                position_2 = self.coordinates[j]
                element_2 = self.atom_list[j]
                
                if not element_2 in atom_types:
                    continue
                
                dist = distance(position_1, position_2)
                if dist <= cutoff:
                    self.bonds.add((i, j))
    ...

With both a list of atoms and a list of bonds ready, we can now proceed to represent these in Blender:

class Structure:
    ...
    
    def draw_structure(self):
        self.draw_atoms()
        self.draw_bonds()

    def draw_atoms(self):
        for element, position in zip(self.atom_list, self.coordinates):
            bpy.ops.mesh.primitive_uv_sphere_add(size=sizes[element], location=position)
            bpy.context.active_object.data.materials.append(bpy.data.materials[element])
            bpy.ops.object.shade_smooth()    

A quick draw_structure() method combines the drawing of the atoms and the bonds between them. The former is straightforward — a sphere is drawn centered at each position in the list of coordinates. Its radius and color (or rather material) is governed by the corresponding element.

The drawing of the bonds is a tad more involved. For each bond we have to draw a cylinder of a height identical to the distance between the bonded atoms, centered at the midpoint between the atoms, and correctly oriented to boot. We proceed as follows. First, we compute the vector pointing from one atom to the other. Next, we find the midpoint of the line connecting the two atoms, as well as the distance between them. As each new cylinder is initially drawn standing vertically, we will need to rotate it. To that end, we compute the angle (via the scalar product) between a unit vector in the vertical direction and the normalized vector in the direction of the bond calculated earlier. We also carry out a cross product of these two vectors to get the appropriate rotation axis. Finally, the cylinder is assigned its respective “bond” material, and rotated in place:

class Structure:
    ...
    def draw_bonds(self):
        for atom_1, atom_2 in self.bonds:
            pos_1 = self.coordinates[atom_1]
            pos_2 = self.coordinates[atom_2]
            
            difference = pos_2 - pos_1
            center = (pos_2 + pos_1) / 2.0
            magnitude = distance(pos_1, pos_2)
            
            bond_direction = normalize(difference)
            vertical = np.array((0.0, 0.0, 1.0))
            rotation_axis = np.cross(bond_direction, vertical)
            angle = -np.arccos(np.dot(bond_direction, vertical))
    
            bpy.ops.mesh.primitive_cylinder_add(radius=0.1, 
                                                depth=magnitude, 
                                                location=center)
            bpy.context.active_object.data.materials.append(bpy.data.materials['bond'])
            bpy.ops.object.shade_smooth()
            bpy.ops.transform.rotate(value=angle, axis=rotation_axis)
    ...


Done. We can now proceed to read an .xyz file, find all relevant bonds, and draw the structure in Blender. Running the script with alt + p should immediatelly cause an arrangement of atoms to materialize in the adjacent window:

NiO2_FeO2 = Structure('NiO2_FeO2.xyz')
NiO2_FeO2.add_bonds(('Ni', 'O', 'Fe'), 2.4)
NiO2_FeO2.add_bonds(('O', 'H'), 1.5)
NiO2_FeO2.draw_structure()

Making it Look Nice

I am not a Blender expert — chances are, you can produce much nicer renderings of the structure. Here, I will just describe what I did to get my cartoony result. If you feel like having shiny reflecting atoms, or want to add lights, change the perspective, or play around with fancy post-processign effects, be my guest. That is actually what I wanted to achieve — once the structures are imported in Blender, you have the freedom to do what you want!

For all materials, I switched from Lambertian shading to the Fresnel algorithm. In the world settings, I played around with the background color, producing a subtle gradient of grays. Next, I increased the intensity of the ambient light to soften the shadows. Finally, I carried out an edge detection to paint the outlines of the atoms black. Et voilà:

Until next time!


For the sake of convenience, I reproduce here the whole script:

import bpy
import numpy as np

sizes = { 'Ni' : 0.7, 'Fe' : 0.7, 'O' : 0.5, 'H' : 0.2 }
colors = { 'Ni' : (0.0, 0.0, 0.7), 'Fe' : (0.4, 0.4, 0.7),
            'O' : (1.0, 0.0, 0.0),  'H' : (1.0, 1.0, 1.0), 
            'bond': (0.05, 0.05, 0.05) }
            
for key in colors.keys():
    bpy.data.materials.new(name=key)
    bpy.data.materials[key].diffuse_color = colors[key]
    bpy.data.materials[key].specular_intensity = 0.1

def distance(a, b):
    return np.sqrt(np.dot(a - b, a - b))

def normalize(a):
    return np.array(a) / np.sqrt(np.dot(a, a))

class Structure:
    def __init__(self, filename):
        with open(filename, 'r') as f:
            data = f.readlines()
        
        self.n_atoms = int(data[0].split()[0])
        
        self.atom_list = [line.split()[0] for line in data[2:]]
        
        coordinates = []
        
        for i, line in enumerate(data[2:]):
            coordinates.append([float(value) for value in line.split()[1:4]])
        
        self.coordinates = np.array(coordinates)
        self.bonds = set()
    
    def add_bonds(self, atom_types, cutoff):
        for i in range(self.n_atoms - 1):
            position_1 = self.coordinates[i]
            element_1 = self.atom_list[i]
            
            if not element_1 in atom_types:
                continue
            
            for j in range(i + 1, self.n_atoms):
                position_2 = self.coordinates[j]
                element_2 = self.atom_list[j]
                
                if not element_2 in atom_types:
                    continue
                
                dist = distance(position_1, position_2)
                if dist <= cutoff:
                    self.bonds.add((i, j))
    
    def draw_bonds(self):
        for atom_1, atom_2 in self.bonds:
            pos_1 = self.coordinates[atom_1]
            pos_2 = self.coordinates[atom_2]
            
            difference = pos_2 - pos_1
            center = (pos_2 + pos_1) / 2.0
            magnitude = distance(pos_1, pos_2)
            
            bond_direction = normalize(difference)
            vertical = np.array((0.0, 0.0, 1.0))
            rotation_axis = np.cross(bond_direction, vertical)
            angle = -np.arccos(np.dot(bond_direction, vertical))
    
            bpy.ops.mesh.primitive_cylinder_add(radius=0.1, 
                                                depth=magnitude, 
                                                location=center)
            bpy.context.active_object.data.materials.append(bpy.data.materials['bond'])
            bpy.ops.object.shade_smooth()
            bpy.ops.transform.rotate(value=angle, axis=rotation_axis)
    
    def draw_atoms(self):
        for element, position in zip(self.atom_list, self.coordinates):
            bpy.ops.mesh.primitive_uv_sphere_add(size=sizes[element], location=position)
            bpy.context.active_object.data.materials.append(bpy.data.materials[element])
            bpy.ops.object.shade_smooth()    
    
    def draw_structure(self):
        self.draw_atoms()
        self.draw_bonds()

NiO2_FeO2 = Structure('NiO2_FeO2.xyz')
NiO2_FeO2.add_bonds(('Ni', 'O', 'Fe'), 2.4)
NiO2_FeO2.add_bonds(('O', 'H'), 1.5)
NiO2_FeO2.draw_structure()

Quantum Tunneling, Part 1

In this Python project, we will look at quantum tunneling. Even if you are completely new to quantum mechanics, you must have heard of this mysterious, quantum property of matter. Particles can, apparently, move through solid barriers! What is this quantum woo all about? This, we will find out by solving a standard textbook problem — that of a one-dimensional potential barrier. Our goal will be to derive the tunneling probability. We will then implement a simulation of a particle near such a potential barrier and look at the numerical solution of the problem.

The Schrödinger equation

What do we do when we want to study the evolution of a macroscopic physical system — a bullet, a ping-pong ball, the Solar system — in time? We reach for an equation of motion, such as the second law of Newton. To describe a system in classical mechanics we only need the equation of motion and two pieces of data: the initial position, and the initial velocity of all parts of the system. Armed with these, we can predict the state of the system at any arbitrary point in time.

Quantum mechanics differs from classical mechanics in the equation of motion and the required initial conditions. A quantum particle is described not by its position and velocity, but rather by a complex function — called the wave function — spread out over all of space. Its motion (or change), in turn, is dictated by Schrödinger’s equation of motion:

\hat{H}\Psi(\mathbf{r}, t) = i \hbar \frac{\partial}{\partial\,t} \Psi

The capital H with the fancy hat is the Hamiltonian operator. It is related to the Hamiltonian we discussed some time ago in relation to the classical Hamiltonian description of the double pendulum. The difference here is that instead of just a scalar number (which can be interpreted as the total energy of the system in certain cases), it is an operator — a mathematical entity that acts on a wave function and changes it, producing a different wave function.

Furthermore, the above equation is clearly a first-order partial differential equation in time. Hence, only the initial state of the system is required as the initial condition.

Finally, it would be nice to know the shape of the Hamiltonian operator. Generally, one can go from classical mechanics to quantum mechanics just by putting hats on the quantities (yes, it’s often that easy). For example, a classical particle in a spatially changing potential (think electric potential energy or gravitational potential) has the following Hamiltonian:

H = \frac{\mathbf{p}^2}{2m} + V(\mathbf{r})

The corresponding quantum-mechanical Hamiltonian operator is:

\hat{H} = \frac{\hat{\mathbf{p}}^2}{2m} + \hat{V}(\mathbf{r})

Now we just plug in the expressions of the position and momentum operators in position space (read regular space): \hat{\mathbf{r}} = r and \hat{\mathbf{p}} = -i \hbar \nabla. We find:

\hat{H} = - \frac{\hbar^2}{2m}\nabla^2 + V(\mathbf{r})

Our Schrödinger equation therefore looks like this:

\left[-\frac{\hbar^2}{2m}\nabla^2 + V(\mathbf{r})\right]\Psi = i \hbar \frac{\partial}{\partial\,t} \Psi

What are its solutions \Psi? Since the left hand side of the equation is independent of time, we could try to look for a separated solution. Moreover, the right hand side would then represent a simple first-order differential equation in time. It makes sense, then, to look for a solution in the form:

\Psi(\mathbf{r}, t) = \psi(\mathbf{r})e^{-i\frac{E}{\hbar}t}

Plugging this Ansatz into our equation we have that:

e^{-i\frac{E}{\hbar}t}\left[\frac{\hbar^2}{2m}\nabla^2 + V(\mathbf{r})\right]\psi(\mathbf{r}) = i \hbar \psi(\mathbf{r})  \frac{\partial}{\partial\,t} e^{-i\frac{E}{\hbar}t}

Carrying out the time derivation and dividing by the complex exponential we get:

\left[-\frac{\hbar^2}{2m}\nabla^2 + V(\mathbf{r})\right]\psi(\mathbf{r}) = E \psi(\mathbf{r})

This is called the time independent Schrödinger equation. Its solutions are called eigenstates — they represent wave functions that, upon action of the Hamiltonian operator, remain identical up to a scalar multiplicative coefficient: the eigenvalue. This scalar value can in turn be identified with the total energy corresponding to the state. In simpler terms, think of the eigenstates as of the basic building blocks of the solution, while the eigenvalues are the corresponding energies. Let us now solve the Schrödinger equation for a model system illustrating the essence of quantum tunneling!

Rectangular Potential Barrier

Consider a one-dimensional potential barrier described by a potential V(x):

V(x) = \begin{cases} 0\ &\text{if}\ x < 0 \\ V_0\ &\text{if}\ 0 \leq x \leq a \\ 0\ &\text{if}\ x > a\end{cases}

This naturally splits the physical system into three parts. Let us designate the solution for all x < 0 by \psi_{\text{I}}(x), for 0 \leq x \leq a by \psi_{\text{II}}(x), and for all even greater x by \psi_{\text{III}}(x).

The time independent Schrödinger equation for the region \text{I} is:

-\frac{\hbar^2}{2m}\frac{\partial^2}{\partial x^2}\psi_{\text{I}}(x) = E \psi_{\text{I}}(x)

the solution of which is trivial. In the absence of a potential, the solution is a linear combination of sines and cosines. However, we might instead prefer the complex exponential form:

\psi_{\text{I}}(x) = A_r e^{ikx} + A_l e^{-ikx}

Plugging this into the left hand side of our equation of motion we find that:

-\frac{\hbar^2}{2m}\frac{\partial^2}{\partial x^2}\left( A_r e^{ikx} + A_l e^{-ikx} \right) = -\frac{\hbar^2}{2m} \left(-k^2A_re^{ikx} - k^2 A_l e^{-ikx}\right) = \frac{\hbar^2k^2}{2m} \psi_{\text{I}}(x)

and thus:

\frac{\hbar^2k^2}{2m} \psi_{\text{I}}(x)  = E\psi_{\text{I}}(x) \quad \Rightarrow \quad k = \sqrt{\frac{2mE}{\hbar^2}}


The solution in region \text{III} is, naturally, very similar:

\psi_{\text{III}}(x) = C_r e^{ikx} + C_l e^{-ikx}


Finally, we can look at the region of space inside the potential barrier. While the potential here is still constant, it is non-zero. This, however, only results in a change of the wavenumber; the solution itself is mathematically the same as in the other regions. We have that:

\psi_{\text{II}}(x) = B_r e^{iqx} + B_l e^{-iqx}

where:

q = \sqrt{\frac{2m(E - V_0)}{\hbar^2}}


How can we interpret the above solution? In the regions to both sides of the potential barrier, the solution represents a superposition of harmonic waves travelling in either direction. Inside the potential barrier, the solution depends on the energy of the particle compared to the barrier height. If E > V_0, the particle “flies above” the barrier. This has a direct analogy in classical physics, like a ball flying above a physical wall. However, if E < V_0, the wavenumber q is imaginary, and the solution becomes a superposition of real exponentials. While exponentials are quickly decreasing functions, they still lead to a non-zero wave function (and thus probability density) inside the potential barrier! Particles flying through walls!


The solution presented just now includes several free parameters. In total, there are seven of them: A_r,\, A_l,\, B_r,\, B_l,\, C_r,\, C_l, and the energy of the particle E. On the other hand side, the wave function has to satisfy certain boundary conditions. For one thing, it must be a smoothly varying function of the spatial coordinate. Hence, we get two continuity conditions on either side of the barrier. For another, the derivative of a wave function must also be continuous. Both the continuity of the wave function, and that of its derivative, stem from the mathematical structure of the Schrödinger equation — it is a second order differential equation in space. As long as the potential is finite, both continuity conditions must hold. Long story short, this gives us another two boundary conditions. Finally, we have the normalization condition. Unlike in the case of spatially confined systems, where the integral of the wave function is a finite number we can normalize, here we must use a different normalization criterion. However, the exact choice does not matter. The important point is that we get a fifth constraint.

This leaves us with 7 parameters and 5 constraints. Hence, the energy spectrum of the solution is continuous and doubly degenerate. In other words: for any choice of energy E, there exist two solutions, representing a particle moving in either direction.


Now, let’s see where we stand. The solution to the problem is:

\begin{aligned}\psi_{\text{I}}(x) &= A_r e^{ikx} + A_l e^{-ikx} \\  \psi_{\text{II}}(x) &= B_r e^{iqx} + B_l e^{-iqx}  \\  \psi_{\text{III}}(x) &= C_r e^{ikx} + C_l e^{-ikx} \end{aligned}

The continuity of the wave function and of its derivative yields four boundary conditions:

\begin{aligned} \psi_{\text{I}}(0) &= \psi_{\text{II}}(0)   \\ \left. \frac{\partial\psi_{\text{I}}}{\partial x}\right|_0 &= \left.  \frac{\partial \psi_{\text{II}}}{\partial x}\right|_0 \\ \psi_{\text{II}}(a) &= \psi_{\text{III}}(a) \\  \left. \frac{\partial \psi_{\text{II}} }{\partial x}\right|_a  &=  \left. \frac{\partial \psi_{\text{III}} }{\partial x}\right|_a    \end{aligned}

Plugging in the expressions for the wave functions \psi we get:

\begin{aligned}A_r  + A_l  &= B_r + B_l \\  i k \left(A_r - A_l\right) &= iq\left(B_r - B_l\right) \\ B_r e^{i q a} + B_l e^{-i q a} &= C_r e^{i k a} + C_l e^{-i k a} \\ iq\left(B_r e^{iqa} - B_l e^{-iqa}\right)&=  ik\left(C_r e^{ika} - C_l e^{-ika}\right)  \end{aligned}

This is the system of equations we need to solve. However, this represents a very general case.

Let us consider a particle moving in from the left with a unit amplitude A_r \equiv 1. We will neglect the case of a particle moving towards the barrier from the right by setting C_l \equiv 0. We now rename the parameter A_l \equiv r, as it represents the reflected amplitude, and C_r \equiv t, since this corresponds to the transmitted part. We now have:

\begin{aligned}1  + r &= B_r + B_l \\  i k \left(1 - r\right) &= iq\left(B_r - B_l\right) \\ B_r e^{i q a} + B_l e^{-i q a} &= t e^{i k a} \\ iq\left(B_r e^{iqa} - B_l e^{-iqa}\right)&=  ikt e^{ika}  \end{aligned}

Four equations, four parameters. We can get rid of B_r and B_l, and find the expressions for the reflected and transmitted amplitudes. Their squares, |r|^2 and |t|^2, give us the reflection and transmission coefficients, respectively.


As this post is getting quite long, let me stop here. In the next instalment we shall solve the above system of equations, find the two parameters r and t, and calculate the probability with which the particle tunnels through our potential barrier.

Until next time!

Radial Distribution Function

For my research, I study semiconductor / water interfaces, much like this one:

One of the methods I use is ab-initio molecular dynamics. The idea is rather simple: the nuclei are treated as classical particles, propagated in space and time using the Verlet algorithm. At each time step, the electronic structure of the current atomic configuration is calculated using density functional theory (DFT), which, in principle, should yield the exact electronic density of the ground state of the system. Next, the force on the nuclei is calculated taking the newly-computed distribution of electrons. And finally, the force is used to propagate the nuclei another step forward in time.

Due to the inherently high computational cost of first-principles methods such as DFT, we can afford to simulate only so many atoms. Careful tests have to be carried out to ensure that our simulation box is large enough to faithfully replicate the much larger system we are trying to model — at least from the perspective of the studied phenomenon. Among other things, we need to verify that the layer of water representing the solvent is sufficiently thick so as to allow for the water to behave as a liquid. Too few water molecules cramped between the periodically repeated interfaces, and the water remains frozen, bound at the interface.

In today’s post, we will write a script to investigate the radial distribution function of the oxygen atoms in water. This is a key indicator of the liquid-like quality of the simulated water layer. Let’s hop into it!

Radial Distribution Function

So what is this radial distribution function? As it often so happens, a picture is worth more than a thousand words:

The radial distribution function (RDF), pair correlation function, or often just g of r, describes the probability of finding a particle at a given distance from a reference particle. In the picture above, the oxygen-oxygen RDF for liquid water is shown. Imagine you are the oxygen atom of any given H2O molecule. If you observe your surroundings over some period of time, how far away, on average, are other water molecules (or their oxygen atoms, to be more specific)? At distances below 2.5 angstrom (0.25 nanometers), you are extremely unlikely to see any other oxygen atom. Water molecules simply do not want to be that close, the reason for the remarkable incompressibility of liquid water. At around 2.7 Å, a sharp peak in the RDF is observed. That’s the first solvation shell. While water molecules are strongly repulsive at short distances, a powerful network of hydrogen bonds binds them together. If you average out over each angle, and over the neighborhood of every water molecule, these hydrogen bonds manifest themselves in a sharp increase in the probability of finding water molecules at a distance of about 2.7 Å. Above this distance, the RDF drops, and then oscillates a bit, before finally converging to a value of 1. The interpretation is as follows. If \rho is the macroscopic mass density of water (1 g / cm), then \rho g(r) is the density observed at a distance r from the reference point. It makes sense that in a disordered system like liquid water, the local water density at large distances should approach the macroscopic value, hence g(r) \rightarrow 1.

The Algorithm

The goal here is simple. Given a molecular dynamics trajectory — a series of atomic configurations at each time step — compute the oxygen-oxygen RDF. Comparison of the calculated RDF with that of liquid water will tell us how close (or not) we are to simulating the proper behaviour of the water layer. How do we go about it?

First, let’s start with an oxygen atom of our choice as the reference. We will iterate over every other water-molecule-oxygen atom and calculate the distance. We will then create a histogram of the oxygen atom count as a function of distance. Repeat this for every water oxygen atom, and average over every frame. This will give us the distribution of oxygen atoms as a function of their distance. Naturally, the greater the distance, the larger the volume of the spherical shell corresponding to the corresponding bin of the histogram. We need to divide the count of the oxygen atoms by the volume of these spherical shells to achieve something akin to a numerical density. From there, it is a matter of simple normalization to arrive at the final radial distribution function. The following series of graphics illustrates the process:

There are two points to keep in mind. First of all, the simulation cell is periodically repeated. Hence, when evaluating distances between two atoms, we have to take the cell parameters into account. Two atoms at the opposing ends of a simulation box are, technically, quite close to each other (or rather, close to their respective periodic image). Second, when calculating the volume of the spherical shell corresponding to a given distance, we need to account for the finite size of the water region. Since water molecules are present in a portion of the simulation box only, we should just consider the volume of the part of the spherical shell that falls within this subspace.

The Script

Let’s start simple with a function to calculate the distance between two particles represented by arrays of three coordinates:

import matplotlib.pyplot as plt 
import scipy as sp

def distance(a, b):
    dx = abs(a[0] - b[0])
    x = min(dx, abs(A - dx))
    
    dy = abs(a[1] - b[1])
    y = min(dy, abs(B - dy))
    
    dz = abs(a[2] - b[2])
    z = min(dz, abs(C - dz))

    return sp.sqrt(x**2 + y**2 + z**2)

This way, we correctly account for the periodic images of the atoms we sample. Note that A, B, and C are the cell parameters of the computational cell, which we assume is orthogonal. I know, I know, global variables are evil. Deal with it.


Next, we implement a function to calculate the volume of a sphere of a given radius. Recall, however, that we need to take the size of the water region into account. We will assume that the water layer is sandwiched between two horizontal semiconductor interfaces. Only a certain range of vertical coordinates is thus admissible. When we calculate the volume of the sphere, we only restrict ourselves to this range of the vertical coordinate. Anything below or above will get cut off. Go look up the volume of a spherical cap to understand the calculations in this one:

def volume(z, r, z_bot, z_top):
    """ volume of a sphere of radius r located at height z """
    volume = 4.0 / 3.0 * sp.pi * r**3
    
    """ cut off a spherical cap if the sphere extends below z_bot """
    if z - r < z_bot:
        h = z_bot - (z - r)
        volume -= sp.pi * h**2 * (r - h / 3.0)

    """ cut off a spherical cap if the sphere extends above z_top """
    if z + r > z_top:
        h = z + r - z_top
        volume -= sp.pi * h**2 * (r - h / 3.0)
    
    return volume


With these two helper functions down, we go on to implement the Trajectory class:

class Trajectory:
    def __init__(self, filename, skip, z_bot_interface, z_top_interface,
                 interface_offset=4.0, resolution=200):

        self.filename = filename
        self.skip = skip
        self.z_top = z_top_interface - interface_offset
        self.z_bot = z_bot_interface + interface_offset
        self.resolution = resolution
        
        self.parse_input()
        self.compute_volume_per_h2o()

    ...

The input file, passed to our class through the filename argument, is expected to be of the .xyz format. It’s rather simple. The .xyz coordinate file is a human-readable file format. The first line holds the number of atoms. The second line is reserved for comments. Then, for each atom, there is one line that consists of the chemical symbol of the atom type, and three floating point numbers representing the x, y, and z coordinate of the atom in angstrom. Knowing this, we will implement the parse_input() method as follows:

class Trajectory:
    ...
    def parse_input(self):
        with open(self.filename, 'r') as f:
            data = f.readlines()

        self.n_atoms = int(data[0].split()[0])
        self.n_steps_total = int(len(data) / (self.n_atoms + 2))
        self.n_steps = self.n_steps_total // self.skip
        
        self.atom_list = [line.split()[0] for line in 
                          data[2 : self.n_atoms + 2]]

        self.coordinates = sp.zeros((self.n_steps, self.n_atoms, 3))
        for step in range(self.n_steps):
            coords = sp.zeros((self.n_atoms, 3))
            
            i = step * self.skip * (self.n_atoms + 2)
            for j, line in enumerate(data[i + 2 : i + self.n_atoms + 2]):
                coords[j, :] = [float(value) for value in line.split()[1:]]
            
            self.coordinates[step] = coords

    ...

Let’s go step by step here. First, we read the whole file into data. While this is approach is rather convenient, the input trajectories can be very large, on the order of hundreds of megabytes — even several gigabytes. In such a case, it would be much more efficient to parse the file line by line on the go. We are not going to worry about such things now.

Next, we read in the number of atoms. We do that by taking the first line of the trajectory, splitting it into words (to get rid of the end-of-line character), taking the first word, and converting it to an integer. We then calculate the number of steps by dividing the length of the file by the number of atoms, not forgetting the 2 header lines per each configuration. Finally, we define the n_steps member variable as a subset of the total number of steps. Since the the distribution of water molecules does not change from one step to another (remember, we need a short time step to capture the oscillations of the light hydrogen nuclei), we will get a perfectly neat RDF by considering only every 10th, or even every 100th snapshot. If we need full accuracy and / or statistics, we can decrease skip to 1.

Finally, we parse the actual data itself. We need to read the chemical symbols of the atoms just once. As for the coordinates, we create a member variable in the form of a three-dimensional array holding the x, y, and z coordinates of each atom at each step of our subset of the trajectory. Some list comprehension magicks are happening here, but by this point you should be able to comprehend what is going on.


With the coordinates parsed, we can go on to calculate an important quantity: the volume (in angstrom) per water molecule. We will use this number to normalize our RDF such that it approaches a value of one for large distances. Moreover, we can compare the computed value with the experimental one to determine how close our simulated water is to the real deal:

class Trajectory:
    ...
    
    def compute_volume_per_h2o(self):
        n_h2o = 0
        for step in range(self.n_steps):
            for i, atom in enumerate(self.coordinates[step]):
                z = atom[2]
                if self.atom_list[i] == "O" and self.z_bot < z < self.z_top:
                    n_h2o += 1
        
        volume = A * B * (self.z_top - self.z_bot)
        average_n_h2o = n_h2o / self.n_steps
        self.volume_per_h2o = volume / average_n_h2o

    ...

We simply loop over every snapshot and count oxygen atoms that lie within our water region. By the way — did you know you can do comparisons of the type a < x < b in Python? Now you know.


Now that the preliminaries are sorted out, we can implement the algorithm to compute the oxygen-oxygen RDF described above:

class Trajectory:
    ...
    
    def compute_radial_distribution(self):
        """ no reason to go above half of the smallest lattice parameter
            as mirror images start to be double-counted """
        r_cutoff = min(A, B) / 2.0

        dr = r_cutoff / self.resolution
        self.radii = sp.linspace(0.0, self.resolution * dr, self.resolution)

        volumes = sp.zeros(self.resolution)
        self.g_of_r = sp.zeros(self.resolution)
        
        for step in range(self.n_steps):
            print('{:4d} : {:4d}'.format(step, self.n_steps))
            
            """ isolate all liquid water molecules based on their oxygens """
            data_oxygen = []
            for i, atom in enumerate(self.coordinates[step]):
                if self.atom_list[i] == "O":
                    if self.z_bot < atom[2] < self.z_top:
                        data_oxygen.append(atom)
            data_oxygen = sp.array(data_oxygen)
            
            for i, oxygen1 in enumerate(data_oxygen):
                """ calculate the volume of the cut spherical shells """
                for j in range(self.resolution):
                    r1 = j * dr
                    r2 = r1 + dr
                    v1 = volume(oxygen1[2], r1, self.z_bot, self.z_top)
                    v2 = volume(oxygen1[2], r2, self.z_bot, self.z_top)
                    volumes[j] += v2 - v1

                """ loop over pairs of O atoms, each pair counts as 2 """
                for oxygen2 in data_oxygen[i:]:
                    dist = distance(oxygen1, oxygen2)
                    index = int(dist / dr)
                    if 0 < index < self.resolution:
                        self.g_of_r[index] += 2.0
        
        for i, value in enumerate(self.g_of_r):
            self.g_of_r[i] = value * self.volume_per_h2o / volumes[i]

    ...

First, we set to zero arrays representing the volumes of the spherical shells, the horizontal axis of the RDF, and the RDF itself. We only calculate the RDF up to a cutoff distance. We could consider distances as large as the lattice parameters. Above this value, pairs of periodic images would pollute the RDF and lead to nonsensical artifacts. However, already a distance larger than one half of the smallest lattice parameter leads to the unphysical double-counting of periodic images of some oxygen atoms. We will thus restrict ourselves to distances that correspond to relevant data, and choose the conservative cutoff of one half the lattice parameter.

Next, we loop over all oxygen atoms and isolate those that lie within the sampled bulk water layer. For each of these oxygen atoms we calculate the volumes of the (cut) spherical shells corresponding to all sampled distances.

We then loop over every oxygen atom pair, calculate their distance, and build up a histogram of the count vs. distance. We can speed up the algorithm by only looping over unique pairs, and counting each pair as two.

Finally, normalizing by the volumes of the spherical shells, and the volume per water molecule computed earlier, we arrive at the radial distribution function!


Now we just write a quick method to plot the result:

class Trajectory:
    ...

    def plot(self, filename=""):
        """ plots the radial distribution function
        if filename is specified, prints it to file as a png """
        
        if not self.g_of_r.any():
            print('compute the radial distribution function first\n')
            return
        
        plt.xlabel('r (Å)')
        plt.ylabel('g$_{OO}$(r)')
        plt.plot(self.radii, self.g_of_r)
        
        if filename:
            plt.savefig('radial_distribution_function.png', dpi=300,
                        bbox='tight', format='png')
        
        plt.show()

… and run the script like this:

A = 12.991
B = 11.832
C = 30.577
bottom_interface = 23.0
top_interface = 40.0

TiO2_H2O = Trajectory('TiO2_H2O-pos-1.xyz', 10, 
                      bottom_interface, top_interface)
TiO2_H2O.compute_radial_distribution()
TiO2_H2O.plot()

Et voilà:

Sure, the curve is a bit noisy. I used only around a thousand snapshots of a rather short molecular dynamics trajectory. More data means a smoother curve — hopefully approaching the oxygen pair distribution of liquid water even closer. But it’s apparent that the main features of the simulated water layer and those of bulk liquid water are very similar: the principal peak is at the correct distance and of the same height, and even the long range behavior is near-identical. Success!!

Until next time!