Analysis B

Analysis Techniques for Computational Material Science

Computational Material Science

modelling system under periodic boundary conditions
describing structural properties
compute thermal and mechanical properties
virtual screening and development of advanced functional materials:
- anodes, cathods
- solar cells
- catalysts
- bioactive glasses
- sensors
- alloy materials etc.

Analysis of Data

It is important to be aware of the accuracy and limitations of the computational methods employed in order to ensure the reliability of the results obtained.

Question

What do you think are the limitations of a NNP?

Answer

Modelling something outside of the training set can be challenging
Need for GPUS

Analysis of Data

It is important to be aware of the accuracy and limitations of the computational methods employed in order to ensure the reliability of the results obtained.

Question

How accurate are NNP methods?

Answer

The degree of accuracy is highly depending on the quality and size of the training set.

Pre-Analysis

Simulation finished without errors: no abort, no error messages appeared

Visual inspection of the system: system behaves properly

Equilibrium state of simulation

Visual Inspection

Did the system behave as expected?

Open the *.xyz file with VMD.

It is a good sign if

no unintended bond breaking or formation
system remains intact -> no collapsing or blowing up
no translation of the whole box or super fast spinning of a molecule

Troubleshooting

In the event that the simulation did not produce the anticipated results:

Try to understand any error messages that may have been raised
Check the *.log and *.in file to see if the simulation was set up properly
Check the starting structure
Check the moldescriptor file and shake file

Equilibrium State of Simulation

First, in order to obtain meaningful physical properties from a simulation, it is essential to ensure that the system is in an equilibrium state.

Question

However, what constitutes an equilibrium state?

Answer

Equilibrium state can be defined as a state in which macroscopic properties (e.g temperture \(T\), pressure \(P\), energies \(E\), volume \(V\),…) do NOT undergo major changes over time.

Once these properties fluctuate around a constant pattern without any long-term trends, it can be assumed that the system has reached an equilibrium state.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
import matplotlib as mpl

mpl.rcParams["font.size"] = 15
mpl.rcParams["axes.labelsize"] = 20


physical_properties = ["SIMULATION-TIME", "TEMPERATURE", "PRESSURE", "E(TOT)", "E(QM)", "N(QM-ATOMS)", "E(KIN)", "E(INTRA)","VOLUME","DENSITY","MOMENTUM","LOOPTIME"]
physical_units = ["fs", "K","bar", "kcal/mol", "kcal/mol", "-","kcal/mol","kcal/mol","A^3", "g/cm^3", "amuA/fs", "s"]
data = pd.read_csv("../data/physical_properties.en",sep='\t+',names=physical_properties)
# print(data.head())
# print("Shape of data: ", data.shape)
time = data["SIMULATION-TIME"]/1e6 # ns
temperature_avg = np.mean(data["TEMPERATURE"])
temperature_std = np.std(data["TEMPERATURE"])
window = 50
gap = 2
running_average = data["TEMPERATURE"].rolling(window=window).mean() #window=window,min_periods=1,center=True,step=gap

def linear_function(x,a,b):
  return a*x+b



plt.figure(figsize=(8,4))
plt.plot(time,data["TEMPERATURE"],color="black", alpha=0.5,label=r"simulation")
plt.plot(time,running_average,label=r"$T_{\mathrm{run.avg}}$")
plt.hlines(temperature_avg,xmin=np.min(time),xmax=np.max(time),linestyles="solid",color="black",label=r"$T_{\mathrm{avg}}$")
plt.hlines(temperature_avg+temperature_std,xmin=np.min(time),xmax=np.max(time),linestyles="dashed",color="black",label=r"$T_{\mathrm{std}}$")
plt.hlines(temperature_avg-temperature_std,xmin=np.min(time),xmax=np.max(time),linestyles="dashed",color="black")

plt.xlabel(r"$t$ / ns")
plt.ylabel(r"$T$ / K")
plt.legend(fontsize=12)
plt.show()

Linear / Volumetric Thermal Expansion Coefficient

The thermal expansion coefficient is a quantity that describes the extend to which a solid substance undergoes a change in response to variations in temperature.

Therefore the system is computed in \(NPT\) ensemble at different temperatures \(T_i\) at \(1\,\mathrm{bar}\).

For our case we consider a temperature range from \(248 - 348\,\mathrm{K}\) in \(25\,\mathrm{K}\) steps.

Linear Thermal Expansion Coefficient

The linear thermal expansion coefficient at \(\mathrm{298}\,\mathrm{K}\) is calculated via the temperature gradient of the respective average lattice parameter \(\left < a \right >\), \(\left < b \right >\) or \(\left < c\right >\) at constant pressure \(P\) (\(NPT\) ensemble)

\[ \alpha _{x}^{T_{\mathrm{298\,K}}} = \frac{1}{x} \left ( \frac{\partial x}{\partial T} \right ) _{P}, x={a,b,c} \]

The slope can be estimated numerically in different ways. A simple way is to employ a finite difference method. There are three fundamental types of this method, the forward, the backward and the central finite difference.

Linear Thermal Expansion Coefficient

In our case we use the central finite difference method with a five-point stencil. In general it is formulated in 1D as:

\[ \begin{align} f'(x) = \frac{-f(x+2h)+8f(x+h)-8f(x-h)+f(x-2h)}{12 h} + \\ +\frac{h^4}{30}f^{(5)(c)}, c \in [x-2h,x+2h] \end{align} \] where \(h\) is the change of \(x\).

If more than five points are used, a higher order stencil can be employed.

Linear thermal Expansion Coefficient

In our case the differentiation can be written as:

\[ \frac{\partial x}{\partial T} \approx \frac{ \left < x^{T_{\mathrm{248\,K}}}\right >-8 \left < x^{T_{\mathrm{273\,K}}}\right > + 8 \left < x^{T_{\mathrm{323\,K}}}\right > - \left < x^{T_{\mathrm{348\,K}}} \right > }{12\Delta T} \]

\[ x \in {a,b,c} \]

Volumetric Thermal Expansion Coefficient

The formulation is the same as for the linear thermal expansion coefficient, except that the volume \(V\) is used instead of the individual lattice parameters:

\[ \alpha _{V}^{T_{\mathrm{298\,K}}} = \frac{1}{V} \left ( \frac{\partial V}{\partial T} \right ) _{P} \]

The volume of an orthorhombic system can be calculated as: \[ V = a\cdot b \cdot c \]

In the general case of a non-orthorhombic crystal sytems the volume is calculated as: \[ V = a b c \sqrt{1-\mathrm{cos}^2(\alpha) -\mathrm{cos}^2(\beta) - \mathrm{cos}^2(\gamma)+ 2(\mathrm{cos}(\alpha) \mathrm{cos}(\beta) \mathrm{cos}(\gamma))} \]

Linear / Volumetric Thermal Expansion Coefficient

Question

How can you estimate the linear/volumetric thermal expansion coefficient in the lab?

Answer

X-ray diffraction (XRD) powder X-ray diffraction (PXRD) neutron powder diffraction (NPD)

Plot Lattice Parameter vs. Temperature

Radial Distribution Functions (RDFs)

The radial distribution function (RDF) is a simple measurement to get structural information of the system.

Radial Distribution Functions (RDFs)

It describes the probability \(P_{ab}(r)\) to find a target particle of type \(b\) at distance \(r\) from a reference particle type \(a\); e.g. the probability to find \(\mathrm{H}_2\) molecules at distance \(r\) from \(\mathrm{Zn}^{2+}\): \[ P_{ab}(r) = \int_{0}^{r'} 4\pi r'^2 g_{ab}(r') dr' \] where \(g_{ab}(r)\) is the RDF. The RDF formulates the average local density \(\left <\rho_{ab}(r)\right >\) normalized by \(\rho=(N_a N_b)\) \[ g(r) = \frac{\left <\rho_{ab}(r)\right >}{\rho} \]

Radial Distribution Functions (RDFs)

The two-particle density correlation function is defined as:

\[ \rho_{ab}(r) =\sum_{i=1}^{N_a} \sum_{j=1}^{N_b} \left < \delta(|\vec{r_{i}}-\vec{r_{j}}| -r)\right > \]

The average number of particles \(b\) that can be found in the shell of distance \(r\) can be determined by multiplying the density \(\rho\) with the probability \(G(r)\) \[ N_{ab}(r) = \rho G_{ab}(r) \]

Radial Distribution Functions (RDFs)

Dirc-Delta Function

\[ \delta(x-x_0) = \begin{cases} 0 & x \neq x_0\\ \infty & x = x_0\\ \end{cases} \] \[ \int_L \delta(x-x_0) dx = 1, x_0 \in L \]

Self-Diffusion Coefficient

In order to describe the dynamics of guest molecules in a host system, it is necessary to calculate transport properties. A transport property would be for example the diffusion, viscosity, electrical or thermal conductivity.

In general transport properties can be described as a property coefficient \(\gamma\) which depends on a microscopic variable \(A\) (e.g. positions \(\Delta \vec{r}\) or the velocities \(\vec{v}\)) that is written as an infinite time integration of a non-normalized time correlation function \(\left < \dot{A}(t)\dot{A}(t_0) \right >\) : \[ \gamma = \int_{0}^{\infty} dt \left < \dot{A}(t)\dot{A}(t_0) \right > \]

Self-Diffusion Coefficient

This is also known as the Green-Kubo relation, which can also be written as the equivalent Einstein expression:

\[ \gamma = \lim_{t\to\infty} \frac{\left < (A(t) - A(t_0))^2 \right >}{2t} = \frac{1}{2} \lim_{t\to\infty} \frac{d}{dt} \left < (A(t) - A(t_0))^2 \right > \]

Self-Diffusion Coefficient

In case of the self-diffusion coefficient \(D_{\mathrm{s}}\), the variable \(A\) is the molecular position \(\vec{r}_{\mathrm{cm}}\), which in general is the center of mass position of the particles: \[ \vec{r}_{\mathrm{cm}} = \frac{\sum_{i=1}^{N} m_i \vec{r}_i}{\sum_{i=1}^{N} m_i} \] where \(m_i\) is the atomic mass, \(\vec{r}_i\) the atomic positions and \(N\) the number of atoms in the particle.

\(\dot{A}\) is the molecular velocity and should also be considered with respect to the center of mass.

Einstein-Relation

The Einstein Relation can therefore be written as the slope of the mean-squared-displacement (\(MSD(\tau)\)) over time origins \(\tau\)

\[ D_{\mathrm{s}} = \frac{1}{2 \cdot d} \lim _{t\to \infty} \frac{d}{dt} MSD(\tau) \] where \(d=1,2,3\) is the dimensionality.

Mean-Squared Displacement

The mean-squared displacement describes the displacement of the particle from a time origin \(t_0\) averaged over the time interval \(\tau\) \[ MSD(\tau) = \left < \frac{1}{N} \sum_{i=1}^{N} |\vec{r} _{i}(t) - \vec{r} _{i}(t_0)|^2 \right >_{\tau} \] where \(N\) is the number of particles of that type and \(\vec{r}_{i}\) is the center-of-mass position vector of particle \(i\).

Mean-Squared Displacement

Looking at Figure Figure 3 (a), we see that the beginning of the MSD is not linear. Only at higher correleation times the expected diffusion (see Figure Figure 3 (b)) behavior is exhibited.

Question:

How do you get the self-diffusion coefficient out of the MSD regarding the Einstein-relation?
How do you get the gradient/slope of the MSD?
What does the limes in that formular mean?

Mean-Squared Displacement

Answer:

The self-diffusion coefficient is the slope of the MSD divided by 2 \(\cdot\,d\).

But this is only true if correlation time is long enough. We can say that in the linear regime we fulfill this limes due to unchanging slope which represents the difffusion coefficient.

Therefore it is important to have really linear behaviour in order to apply the Einstein relation.

You get the slope by fitting a linear regression at the last linear section.

Mean-Squared Displacement

Figure 3: (a) MSD(t) plot, (b) diffusion regimes

Green-Kubo Relation

The Green-Kubo formalism requires numerical integration of the velocity-autocorrelation function \(VACF(t)\) to get the self-diffusion coefficent \(D_{\mathrm{s}}\) \[ D_{\mathrm{s}} = \frac{1}{d} \int _{0}^{\infty} VACF(t) dt \] where \(d=1,2,3\) is the dimension.

Veloctiy-Autocorrelation Function

The Velocity-Autocorrelatin Function \(VACF(\tau)\) is averaged over a time interval \(\tau\) \[ VACF(\tau) = \left < \frac{1}{N} \sum_{i=1}^{N} |\vec{v} _{i}(t) \vec{v} _{i}(t_0)|^2 \right >_{\tau} \] where \(\vec{v}_{i}\) is the velocity the particle \(i\), and \(t_0\) labels the time origin.

Veloctiy-Autocorrelation Function

Activation Energy

The diffusion coefficient is directly depending on the activation energy \(E_{\mathrm{a}}\) by an exponential decacy \[ D_{\mathrm{s}}=D_0 e^{-\frac{E_{\mathrm{a}}}{RT}} \] where \(D_0\) is a constant factorn, \(R=8.3145\,\mathrm{JK^{-1}mol^{-1}}\) is the ideal gas constant and \(T\) is the temperature.

With this formalism we can apply an Arrhenius equation to fit a linear function to get the activation energy \(E_{\mathrm{a}}\) \[ \ln (D_{\mathrm{s}}) = -\frac{E_{\mathrm{a}}}{RT} + \ln{D_0} \]

Activation Energy

Figure 5: (a) D(T), (b) D(n), (c) ln(D(1000/T) and (d) Ea(n))