The Monty Hall Problem

Lately I’ve been brushing up my knowledge of probability and statistics and naturally came across the Monty Hall problem. According to Wikipedia, “no other statistical puzzle comes so close to fooling all the people all the time”. Even Paul Erdős remained unconvinced until he was shown a computer simulation of the problem. This prompted me to try it myself, just so that I could marvel at the result:

import numpy as np

trials = 10000
count1 = 0
count2 = 0
doors = [0, 1, 2]

for i in range(trials):
    car_door = np.random.choice(doors)
    first_choice = np.random.choice(doors)
    if first_choice == car_door:
        count1 += 1

    other_doors_with_goat = [door for door in doors
                             if door != car_door
                             and door != first_choice]
    revealed_goat = np.random.choice(other_doors_with_goat)
    second_choice = [door for door in doors
                     if door not in [first_choice, revealed_goat]][0]
    if second_choice == car_door:
        count2 += 1

print('Chance of winning without switching: {:.2f}'.format(count1/trials*100))
print('Chance of winning with switching: {:.2f}'.format(count2/trials*100))

Machine Learning and Thin Sections

The project I’m going to talk about here was an attempt to build a data-driven model for rock permeability from the corresponding petrographic variables, such as porosity, mud content, grain fractions and so on. These variables were obtained from applying the so-called point counting technique to rock thin sections like this one:

GhawarBCGI

In the end, each sample was characterized by 28 features.

Permeability of a given rock sample depends of the total amount of pore space, characterized by porosity, as well as how the pore space is distributed within the rock, which we can express as some kind of a probability density function of pore sizes. In essence, what I’m going to investigate is whether the information about pore size distribution can be sufficiently captured by the bulk petrographic properties extracted from thin sections.

First we load the data (which I cannot share, unfortunately) and pick the samples that are permeable and discard dolomite rocks, because their permeability behavior is significantly different from limestones and will be studied separately.

data = pd.read_csv('ts.csv', na_values='-')
data.fillna(0.0, inplace=True)
data = data[(data['FACIES'] != 'DOLO') & (data['PERM'] != 0)

This results in 572 samples with 28 features, of which 5 are categorical and the rest are numeric. Next we look for errors in data. Errors come from the fact that some features are volumetric fractions with a total sum of either 100 percent or a value of another feature. For instance, we have 11 different grain type features that should ideally sum to 100. Alternatively, grain fractions were not counted at all for some samples and hence should sum to 0.

# GRAINS
grain_component_columns = ['OOID', 'PELOID', 'FORAM', 'MOLL', 'ECHIN', 'CORAL',
                           'STROM', 'DALGAE', 'COATGRN', 'INTRAC', 'OTHERGRN']
grain_sums = data[grain_component_columns].sum(axis=1).values
wrong_grains = np.logical_not(np.logical_or(np.isclose(grain_sums, 100),
                                            np.isclose(grain_sums, 0.0)))
print(f"Number of incorrectly assigned grain fractions: {np.count_nonzero(wrong_grains)}")

# POROSITY
porosity_component_columns = ['Interparticle (BP)', 'Moldic (MO)', 'WP intraparticle',
                              'BC Intercrystalline', 'OTHERPOR']
porosity_sum = data[porosity_component_columns].sum(axis=1).values
porosity_difference = porosity_sum  - data['POR'].values
wrong_porosity = np.logical_not(np.isclose(porosity_difference, 0.0, atol=1e-3))
print(f"Number of incorrectly assigned porosity fractions: {np.count_nonzero(wrong_porosity)}")

# CEMENT
cement_component_columns = ['FC', 'MO', 'EC']
cement_sum = data[cement_component_columns].sum(axis=1).values
cement_difference = cement_sum  - data['Total Cement'].values
wrong_cement = np.logical_not(np.isclose(cement_difference, 0.0, atol=1e-3))
print(f"Number of incorrectly assigned cement fractions: {np.count_nonzero(wrong_cement)}")

# TOTAL VOLUME
total_volume_components = ['POR', 'Mud and matrix', 'TOTGRN', 'Total Cement']
volume_fractions_sum = data[total_volume_components].sum(axis=1).values
wrong_volume = np.logical_not(np.isclose(volume_fractions_sum, 100))
print(f"Number of incorrectly assigned volume fractions: {np.count_nonzero(wrong_volume)}")
Number of incorrectly assigned grain fractions: 96
Number of incorrectly assigned porosity fractions: 101
Number of incorrectly assigned cement fractions: 1
Number of incorrectly assigned volume fractions: 1

There is a significant number of samples with errors. The decision I made was to discard samples with errors in porosity, cement or volume related features and discard grain fraction features as too unreliable, so that I can keep the samples with errors in grain features alone.

Before moving forward, I log-transformed permeability values and then normalized all the features to zero mean and unit variance:

from sklearn.preprocessing import StandardScaler
from math import log10

X = data_red.drop('PERM', axis=1).values
y = data_red['PERM'].apply(lambda x: log10(x)).values

X_std = StandardScaler().fit_transform(X)

After that I performed feature selection using a range of selection algorithms: Pearson correlation, distance correlation (based on this gist), ranking based on elastic net regression, random forest ranking, stability selection using randomized lasso and recursive feature elimination.

from sklearn.ensemble import RandomForestRegressor 
from sklearn.feature_selection import f_regression
from sklearn.linear_model import ElasticNet

cols = data_red.drop('PERM', axis=1).columns
feature_rankings = pd.DataFrame(index=cols)

# univariate feature selection
def dist(x, y):
    return np.abs(x[:, np.newaxis] - y)

def d_n(x):
    d = dist(x, x)
    dn = d - d.mean(0) - d.mean(1)[:, np.newaxis] + d.mean()
    return dn

def dcov_all(x, y):
    dnx = d_n(x)
    dny = d_n(y)
    
    denom = np.product(dnx.shape)
    dc = (dnx * dny).sum() / denom
    dvx = (dnx**2).sum() / denom
    dvy = (dny**2).sum() / denom
    dr = dc / (np.sqrt(dvx) * np.sqrt(dvy))
    return dc, dr, dvx, dvy

f_test, _ = f_regression(X_std, y)
f_test /= np.max(f_test)

dc_scores = []
for i in range(X_std.shape[1]):
    dc, _, _, _ = dcov_all(X_std[:, i], y)
    dc_scores.append(dc)
dc_scores = np.array(dc_scores)
dc_scores /= np.max(dc_scores)

feature_rankings['Pearson'] = f_test
feature_rankings['DC'] = dc_scores

# linear models with regularization
elastic_net = ElasticNet(alpha=0.7, l1_ratio=0.1)

elastic_net.fit(X_std, y)
feature_rankings['ElasticNet'] = elastic_net.coef_

# random forest feature importance
rf = RandomForestRegressor(random_state=0)
rf.fit(X_std, y)
feature_rankings['RF'] = rf.feature_importances_ / np.max(rf.feature_importances_)

# stability selection
from sklearn.linear_model import RandomizedLasso
rlasso = RandomizedLasso(alpha=0.005, n_jobs=-1)
rlasso.fit(X_std, y)
feature_rankings['Stability'] = rlasso.scores_

# recursive feature elimination
from sklearn.feature_selection import RFE
selector = RFE(elastic_net, n_features_to_select=1)
selector.fit(X_std, y)
feature_rankings['RFE'] = selector.ranking_
print(feature_rankings)
Pearson DC ElasticNet RF Stability RFE
DEPTH 0.012409 0.043601 -0.000000 1.775766e-02 0.000 52
POR 1.000000 1.000000 0.300875 1.000000e+00 1.000 1
Mud and matrix 0.979700 0.944287 -0.232541 2.740140e-01 0.695 2
TOTGRN 0.329289 0.512823 0.125154 3.748489e-02 0.320 3
DoLo 0.160897 0.348428 -0.114194 2.268208e-02 0.805 4
Total Cement 0.072758 0.182865 0.000000 7.463309e-03 0.035 21
FC 0.091295 0.186574 0.000000 5.180301e-03 0.020 20
MO 0.066512 0.169349 0.000000 1.397019e-03 0.000 19
EC 0.000186 0.004129 0.000000 4.406653e-03 0.020 22
Interparticle (BP) 0.437091 0.652144 0.150489 2.734478e-02 0.290 6
Moldic (MO) 0.022927 0.065443 0.071612 1.832028e-02 0.270 11
WP intraparticle 0.021676 0.051974 0.000000 1.573746e-02 0.000 28
BC Intercrystalline 0.007280 0.014063 -0.000000 1.983509e-04 0.000 34
OTHERPOR 0.000575 0.002267 0.000000 6.326667e-03 0.100 35
Anhy 0.002769 0.002500 0.000000 8.223801e-04 0.005 23
FACIES_BCGI 0.001095 0.018077 0.020526 1.116013e-02 0.420 15
FACIES_CLADO 0.011248 0.017371 0.000000 3.931908e-04 0.000 33
FACIES_MIC 0.355562 0.596872 -0.104562 9.380061e-03 0.535 5
FACIES_SO 0.028381 0.086697 -0.000000 8.336911e-03 0.270 36
FACIES_SRAC 0.046250 0.107694 0.082938 4.219528e-03 0.615 7
TEXTURE_B 0.003209 0.002197 0.000000 1.573111e-07 0.000 40
TEXTURE_G 0.114598 0.212404 0.112852 6.026094e-03 0.525 8
TEXTURE_M 0.149274 0.175061 -0.083807 2.027910e-03 0.460 10
TEXTURE_MLP 0.181436 0.474996 0.089893 6.848541e-05 0.190 9
TEXTURE_P 0.023254 0.261818 -0.009810 1.449022e-02 0.070 17
TEXTURE_W 0.116260 0.226927 -0.042081 3.227577e-03 0.060 12
FIELD_ANDR 0.008156 0.006036 0.000000 7.963694e-05 0.000 51
FIELD_HRDH 0.001377 0.006590 0.000000 2.866362e-03 0.085 30
FIELD_HWYH 0.000298 0.001514 -0.000000 1.839563e-04 0.000 29
FIELD_SDGM 0.000316 0.008238 0.000000 2.149841e-03 0.000 27
FIELD_UTMN 0.004103 0.017334 -0.000000 1.365817e-03 0.110 25
ZONE_1 0.001849 0.002020 -0.000000 3.129510e-03 0.000 31
ZONE_2A 0.094149 0.238360 0.000000 1.640882e-03 0.000 32
ZONE_2B 0.023751 0.083241 0.026671 1.543156e-03 0.120 13
ZONE_3A 0.055387 0.167929 -0.008095 1.245921e-03 0.035 18
ZONE_3B 0.035680 0.073229 -0.000000 1.415927e-03 0.000 26
ZONE_4 0.043935 0.042175 -0.025119 1.454697e-03 0.105 14
WELL_32 0.003606 0.008269 -0.000000 9.772440e-03 0.030 24
WELL_45 0.008116 0.015835 0.016670 3.000428e-03 0.365 16
WELL_46 0.005061 0.005595 -0.000000 1.730220e-03 0.000 37
WELL_71 0.001412 0.002288 0.000000 3.930716e-04 0.000 38
WELL_98 0.000298 0.001514 -0.000000 1.102958e-03 0.000 39
WELL_193 0.004559 0.008605 0.000000 1.978925e-04 0.000 41
WELL_217 0.008156 0.006036 0.000000 5.853586e-05 0.000 42
WELL_222 0.001755 0.010385 0.000000 5.463477e-04 0.000 43
WELL_527 0.000064 0.004559 0.000000 1.092136e-03 0.000 44
WELL_530 0.000370 0.002327 -0.000000 5.450152e-04 0.000 45
WELL_547 0.000380 0.007311 -0.000000 2.273051e-03 0.000 46
WELL_586 0.000012 0.001414 0.000000 9.969440e-04 0.000 47
WELL_602 0.009914 0.024923 -0.000000 8.919203e-04 0.000 48
WELL_603 0.004082 0.006328 -0.000000 5.021024e-07 0.010 49
WELL_660 0.000223 0.006313 -0.000000 5.037259e-05 0.000 50

Based on these results I discarded data on sample location: field, geologic zone, well and depth. I also excluded anhydrite fraction as irrelevant.

After I was happy with the feature set, I moved on with nested cross-validation to pick the most appropriate model for training, but all the regression algorithms I tested returned very similar resulting R2 metrics. Therefore I decided to examine the learning curves for different models. I began with having only porosity and mud as input features (original hypothesis offered by my colleague) and applied kernel ridge and gradient boosting regression models:

krr_lc_porosity_mudgbr_lc_porosity_mud

As one might expect, the tree-based method definitely overfits the data, while the quick plateauing of the training accuracy of kernel ridge regression indicates a high bias level so we need to add additional features to our samples. Adding all major volumetric components (porosity, mud, grains and cement total fractions) does not alter the result, but adding dolomite fraction definitely improves the prediction:

krr_lc_porosity_mud_dologbr_lc_porosity_mud

Any additional features simply increase the variance while not improving the accuracy, with using all features being the most extreme case:

krr_lc_allgbr_lc_all

Finally, here is the residuals graph for the kernel ridge regression with total volume fractions and dolomite fraction as features:

residuals

Residual distribution is definitely not random, we can see patterns and clustering at high permeability values. The resulting prediction errors are about an order of magnitude. So, in conclusion I would say that the measured petrographic variables are definitely not sufficient to explain the permeability behavior of the studied carbonate rocks. In hindsight, it is not very surprising because volumetric fractions do not carry any information on how the pore sizes are distributed, which is a key control on permeability.

Random geometry problem

A friend of mine sent me this geometry problem the other day, where the task is to find the shaded area given that l=10\text{ cm}:

sketch

As the problem is symmetrical with respect to the square diagonal, we can safely focus on just one of the two shaded regions, say the top left one, which I’m going to call S. The way I approached the solution is by noticing that the shaded region is the difference between two circular segments cut off by the same chord AB: the first segment belonging to the circle centered at O with the radius l/2 and the second belonging to the circle centered at C with the radius of l. The area A of a circular sector of radius r and central angle \theta is simply:

A = \frac{r^2}{2}(\theta - \sin\theta).

When we apply this formula to our problem, we get that

S = \frac{l^2}{8}(\alpha - \sin\alpha)-\frac{l^2}{2}(\beta-\sin\beta).

All what’s left is to find what those angles \alpha and \beta are. The central angle \theta of a chord c can be found using:

\sin\theta/2=\frac{c}{2r}.

Again, in our case we have that

\sin\alpha/2=\frac{AB}{l}, \qquad \sin\beta/2=\frac{AB}{2l}, \qquad \sin\alpha/2 = 2\sin\beta/2.

We now have a link between \alpha and \beta. Finally, we find \beta from the AOC triangle and the cosine theorem:

AO^2=AC^2+OC^2-2AC\cdot OC\cos\beta/2,

\frac{l^2}{4}=l^2+\frac{l^2}{2}-2l\frac{\sqrt2}{2}l\cos\beta/2,

\cos\beta/2=\frac{5}{4\sqrt{2}}.

The rest is trivial trigonometry:

\sin\beta/2=\frac{\sqrt{7}}{4\sqrt{2}}, \qquad \sin\beta=\frac{5\sqrt{7}}{16},

\sin\alpha/2=\frac{\sqrt{7}}{2\sqrt{2}}, \qquad \cos\alpha/2=\frac{1}{2\sqrt{2}}, \qquad \sin\alpha=\frac{\sqrt{7}}{4}.

Now we can finally obtain our answer (remember that \alpha is obtuse):

S = \frac{l^2}{8}(\pi-\arcsin\frac{\sqrt{7}}{4} - \frac{\sqrt{7}}{4})-\frac{l^2}{2}(\arcsin\frac{5\sqrt{7}}{16}-\frac{5\sqrt{7}}{16})

= \frac{l^2}{8}(\pi-\arcsin\frac{\sqrt{7}}{4}) - \frac{l^2}{2}\arcsin\frac{5\sqrt{7}}{16} + \frac{\sqrt{7}}{8}l^2.

Both shaded regions have the area of

\boxed{2S = \frac{l^2}{4}(\pi-\arcsin\frac{\sqrt{7}}{4}+\sqrt{7}) - l^2\arcsin\frac{5\sqrt{7}}{16}.}

Backstripping

Introduction

My current project involves developing a software tool that conducts the so-called backstripping analysis. In this post I would like to distill the information I’ve learned about this particular technique.

Backstripping simulation is a backward-in-time process where at each time step one removes the top-most (youngest) sediment layer together with its effects on the underlying sediments (via decompaction) and the Earth’s crust (isostasy). The purpose of this exercise is to compute the temporal evolution of the component of crustal subsidence that is not attributed to sediment loading. This subsidence is called thermo-tectonic, which essentially means that we have no clue as to why it happened but we suspect that it was caused by some combination of tectonic forces and thermal effects.

The idea behind backstripping is based on the following simplified viewpoint of the geological evolution. Let’s consider an abstract geological column at some location with water depth w and sediment thickness S, as well as the same location at some previous geological time when no sediments were present and the water depth was w_0. The difference in water depth could have been caused by several reasons: the crustal deflection due to the sediment load \xi, the global sea level change \Delta, and finally the thermo-tectonic subsidence T.

backstripping_scheme

As you can see from my impeccably drawn cartoon, these variables can be linked together via

T = w + S - \xi - \Delta - w_0.

The w_0 term (which is omitted in other materials on backstripping) makes sure that T_0 = 0, which I find more intuitive: the model calculates incremental subsidence with respect to t = 0, so it makes sense to me to define T_0 as the zero baseline.

From all the symbols in this equation the model expects w(t), \Delta(t) and S_p (present-day sediment thickness) as inputs and it computes \xi(t), S(t) and T(t) as outputs.

Crustal deflection

For a 1D crustal deflection the simplest model is that of a hydrostatic equilibrium (or Airy isostasy). It says that there is a level deep down in the Earth’s mantle at which the hydrostatic pressure remains constant regardless of what happens at the surface. When we push the crust down by \xi, we reduce the pressure contribution from the mantle w.r.t. this level by \rho_mg\xi. According to the isostasy theory that same amount of pressure now needs to come from the load on top of the crust. We can see from the figure that the incremental crustal load has three components: sediment load, water load due to deflection and water load due to sea level change (g is cancelled in the following equation):

\rho_m\xi = \rho_bS + \rho_w(\xi-S) + \rho_w\Delta,

\xi = \frac{\rho_b-\rho_w}{\rho_m-\rho_w}S + \frac{\rho_w}{\rho_m-\rho_w}\Delta

By the way, \rho_m, \rho_b and \rho_w are densities of mantle, sediment column (b is for bulk density) and water respectively. When we substitute this expression for \xi back into the backstripping equation, we obtain the final result:

T = w - w_0 + \frac{\rho_m - \rho_b}{\rho_m - \rho_w}S - \frac{\rho_m}{\rho_m - \rho_w}\Delta.

Note that these equations also apply when sediments are exposed to air instead of water (e.g. mountains), if we set \rho_w = 0.

Sediment decompaction

Let’s now look more closely at how to compute sediment properties S(t) and \rho_b(t). Bulk density is by definition

\rho_b(t) = \phi(t)\rho_w + [1-\phi(t)]\rho_s,

where \phi is porosity and \rho_s is sediment rock density (e.g. pure quartz). Notice, that the corresponding term in the expression for \xi is

(\rho_b(t)-\rho_w)S(t) = (\rho_s - \rho_w)[1-\phi(t)]S(t) = (\rho_s - \rho_w)h,

where h = [1-\phi(t)]S(t) is the “rock thickness” of the sediment column which does not change over time if we neglect chemical reactions. That is, a sediment layer may compact over time due to grains being squeezed together from the weight above, but the actual mass of rock remains the same.

This is good news because it means that there is no need to calculate \rho_b(t) over time. Instead, we can get away with calculating h only once from the present-day rock properties. But we still have to to find S(t) and for this we need a decompaction model.

To derive it we will rely again on the fact that h is constant over time. We will also assume that the temporal change of porosity can be captured with a porosity-depth relationship \phi(z). In other words, porosity changes over time only due to the sediments progressive burial and hence \phi(t) = \phi(z(t)).

Let’s consider an arbitrary sediment layer of thickness S_p and calculate its “rock thickness” h from the present-day stratigraphic information:

h = \int_{z_{\mathrm{max}}}^{z_{\mathrm{max}}+S_p}[1-\phi(z)]\mathrm{d}z = S_p - \int_{z_{\mathrm{max}}}^{z_{\mathrm{max}}+S_p}\phi(z)\mathrm{d}z.

Here z_{\mathrm{max}} is the maximum burial depth of our layer, which does not necessarily equal its present-day burial depth if a major erosional event caused its uplift sometime in the geologic past. From knowing h we can then compute S(z) at another burial depth z by finding the zero of the following non-linear function:

f(S) = S - \int_{z}^{z+S}\phi(z)\mathrm{d}z - h,

which enforces sediment mass concentration during compaction.

In practical terms, we need to employ two numerical methods to model sediment compaction. First is the numerical integration of a porosity-depth relationship. A typical theoretical \phi(z) function exhibits a smooth gradual and monotonic change over depth and hence a second-order Gauss quadrature is sufficient to approximate the integral:

\Delta z = \frac{z_2-z_1}{2}, \quad \bar{z} = \frac{z_1+z_2}{2},

\int_{z_1}^{z_2}\phi(z)\mathrm{d}z\approx\Delta z\left[\phi\left(\bar{z}-\frac{\sqrt{3}}{3}\Delta z\right)+\phi\left(\bar{z}+\frac{\sqrt{3}}{3}\Delta z\right)\right].

The second required numerical technique is the Newton algorithm for solving f(S)=0:

S^{(0)}=h,\quad S^{(j+1)}=S^{(j)}-\frac{f(S^{(j)})}{f^{'}(S^{(j)})}=S^{(j)}-\frac{f(S^{(j)})}{1-\phi(z_{i}+S^{(j)})}.

Sample implementation

Here I want to present a sample Python implementation of the described model in case anyone finds it useful. For our implementation we will use Athy’s porosity-depth relationship, \phi(z) = \phi_0 \exp(-cz), where \phi_0 is a surface porosity and c is a compaction rate. We will only consider depositional events, that is, once a layer had been deposited, it remained in place from then on without being eroded.

We first implement the decompaction related functionality in the Layer class:

import math

from traits.api import (HasStrictTraits, Function, Float,
                        Property, cached_property)

class Layer(HasStrictTraits):
    maximum_burial = Float
    present_thickness = Float
    porosity_function = Function
    compaction_rate = Float
    surface_porosity = Float
    sediment_density = Float
    sediment_thickness = Property(depends_on=['present_thickness,'
                                              'maximum_burial'])

    def set_rock_properties(self, properties_dict):
        self.surface_porosity = properties_dict['surface_porosity']
        self.compaction_rate = properties_dict['compaction_rate']
        self.sediment_density = properties_dict['sediment_density']

    @cached_property
    def _get_sediment_thickness(self):
        """ Compute sediment thickness. """
        z0 = self.maximum_burial
        z1 = z0 + self.present_thickness
        water_thickness = self.integrate_porosity_function(z0, z1)
        return self.present_thickness - water_thickness

    def integrate_porosity_function(self, z0, z1):
        """ Numerically integrate porosity function over the given interval. """
        w = 0.5773502691896257  # sqrt(3)/3
        halflength = 0.5 * (z1 - z0)
        midpoint = 0.5 * (z0 + z1)

        porosity_0 = self.porosity_function(self, midpoint + halflength * w)
        porosity_1 = self.porosity_function(self, midpoint - halflength * w)
        return halflength * (porosity_0 + porosity_1)

    def thickness_at_depth(self, depth, eps=1e-6):
        """ Computes layer's thickness if buried at a given depth. """
        thickness = self.present_thickness  # initial guess
        # Newton iteration
        carry_on = True
        while carry_on:
            water_thickness = self.integrate_porosity_function(depth, depth + thickness)
            function_value = thickness - self.sediment_thickness - water_thickness
            derivative_value = 1.0 - self.porosity_function(self, depth + thickness)
            thickness -= function_value / derivative_value
            carry_on = abs(function_value) > eps
        return thickness

    def sediment_weight(self, constants):
        """ Layer weight above that of water. """
        return (self.sediment_density - constants.water_density) \
            * constants.gravity * self.sediment_thickness

def athy_porosity(layer, z):
    """ Athy's porosity-depth relationship. """
    return layer.surface_porosity * math.exp(-layer.compaction_rate * z)

The Layer objects know how to compute their rock thickness and how thick they would be at different burial depths. To describe a collection of layers deposited at some time with a certain geologic context, we also introduce the Deposition class, as well as the EventManager class which holds a collection of deposition events and can perform calculations which require knowledge of the burial history:

from sortedcontainers import SortedListWithKey
from traits.api import HasStrictTraits, Float, Instance

from layer import Layer

class Deposition(HasStrictTraits):
    age = Float
    bathymetry = Float
    sea_level = Float
    layer = Instance(Layer)

class EventManager(HasStrictTraits):
    events = Instance(SortedListWithKey, kw={'key': lambda e: e.age})
    initial_age = Float
    initial_sea_level = Float
    initial_bathymetry = Float

    def add_events(self, events):
        self.events.update(events)

    def reconstruct_burial_history(self):
        """ Compute maximum burial depths for all the deposited layers. """
        current_burial = 0.0
        for event in self.events:
            event.layer.maximum_burial = current_burial
            current_burial += event.layer.present_thickness

    def decompact_layers(self, starting_event_id, constants):
        """ Decompaction of a sediment column. """
        current_burial = 0.0
        thickness_list = []
        weight_list = []
        for event in self.events[starting_event_id:]:
            thickness = event.layer.thickness_at_depth(current_burial)
            weight = event.layer.sediment_weight(constants)
            current_burial += thickness
            thickness_list.append(thickness)
            weight_list.append(weight)

        return thickness_list, weight_list

    def sea_level_change(self, event_id):
        """ Sea level change for a given event ID. """
        return self.events[event_id].sea_level - self.initial_sea_level

    def bathymetry(self, event_id):
        """ Water depth value for a given event ID. """
        return self.events[event_id].bathymetry

Assuming that we have a setup EventManager object, the actual backstripping algorithm is trivial to implement:

def compute_deflection(sediment_weight, sea_level_change, constants):
    """ helper function for Airy isostasy. """
    total_weight = sediment_weight + constants.gravity * constants.water_density * sea_level_change
    return total_weight / (constants.gravity * (constants.mantle_density - constants.water_density))

def compute_subsidence(event_manager, constants=PhysicalConstants()):
    """ Actual backstripping is performed here. """
    subsidence = []
    thickness_evolution = []
    for event_id in range(len(event_manager.events)):
        thickness, weight = event_manager.decompact_layers(event_id, constants)
        total_thickness = sum(thickness)
        total_weight = sum(weight)
        sea_level_change = event_manager.sea_level_change(event_id)
        bathymetry = event_manager.bathymetry(event_id)
        deflection = compute_deflection(total_weight, sea_level_change, constants)
        s = (bathymetry + total_thickness - deflection - sea_level_change
             - event_manager.initial_bathymetry)
        subsidence.append(s)
        thickness_evolution.append(thickness)
    return subsidence[::-1], thickness_evolution[::-1]

The compute_subsidence function returns the subsidence values as well as the whole sediment column thickness distribution for each geologic age, sorted from earliest to present-day.

Example application

The example application is taken from the Basin Analysis book by Allen and Allen (2013). Here is the main file listing that contains all the data:

from backstripping import prepare_events, compute_subsidence, plot_results

# setup layers
rock_properties = [
    {'surface_porosity': 0.63, 'compaction_rate': 0.51e-3, 'sediment_density': 2720.0},
    {'surface_porosity': 0.49, 'compaction_rate': 0.27e-3, 'sediment_density': 2650.0},
    {'surface_porosity': 0.70, 'compaction_rate': 0.71e-3, 'sediment_density': 2710.0},
    {'surface_porosity': 0.40, 'compaction_rate': 0.60e-3, 'sediment_density': 2720.0},
    {'surface_porosity': 0.20, 'compaction_rate': 0.60e-3, 'sediment_density': 2870.0},
    {'surface_porosity': 0.05, 'compaction_rate': 0.20e-3, 'sediment_density': 2960.0},
]

ages = [260, 245, 210, 160, 145, 125, 100, 80, 55, 45, 0]
sea_levels = [10, 0, 0, -20, -40, 70, 80, 100, 50, 40, 0]
bathymetries = [-20, 0, 20, 10, 20, 20, 200, 300, 350, 325, 300]
rock_types = [4, 5, 1, 4, 3, 1, 2, 0, 1, 0]
thicknesses = [400, 750, 250, 400, 200, 900, 1300, 750, 250, 200]

event_manager = prepare_events(ages, bathymetries, sea_levels, thicknesses, rock_types, rock_properties)
subsidence, thickness_evolution = compute_subsidence(event_manager)
plot_results(ages, subsidence, thickness_evolution, sea_levels, bathymetries)

And here is the result:

bs

As you can see, less than half of the present-date crust depression for this location is due to tectonic forces and most of it is instead caused by the weight of deposited sediments.

All the code necessary to run this example and generate this plot can be found here.