Convergent Cross Mapping (CCM)

From Chaos to Chaos-ality
Exercises
Learn more
Glossary

In this tutorial we present the general idea of Convergent Cross Mapping (CCM, Ye et al. 2017), then show some practical examples using synthetic data, and lastly we propose a real data analysis as exercise. For these activities, you will need the most recent version of R and the rEDM package installed in your working computer.

After running this tutorial, you can review the concepts of cross-mapping and learn details about the rEDM library in the introductory documentation of the package (vignette) which you can find here or in your local R installation with the command

vignette("rEDM-tutorial", package="rEDM")

From Chaos to Chaos-ality

General idea ¹

One of the corollaries of the Generalized Takens’s Theorem is that it should be possible to cross predict or cross map between variables that are observed from the same dynamical system. Consider two variables, X and Y, that interact in a dynamical system. Then the univariate reconstructions based on X alone should uniquely identify the system state and thus the corresponding value of Y, and vice versa.

Synthetic Data

First, let’s simulate a deterministic discrete-time dynamics with chaotic behavior. To create a 150-step time series following this dynamics, run the commands below in the R console:

## Two vectors to store data
X <- c()
Y <- c()
## Initial values
X[1] <- 0.1
Y[1] <- 0.1
## Iterate the dynamics 150 time steps
for(i in 2:150){
  X[i] <- 3.77*X[i-1]*(1-X[i-1])
  Y[i] <- 3.82*Y[i-1]*(1-Y[i-1]-0.05*X[i-1])
}
XY<-as.data.frame(cbind(X,Y))

Note that the variable \(X\) is causing changes in \(Y\). However, if we plot the \(X\) and \(Y\) time series of this chaotic system, they do not seem to be related:

par(cex=1.1,lwd=2)
plot(20:50,X[20:50],type="b", pch=18, col="blue",ylim=c(min(X,Y),max(X,Y)),
     main='Two Species',xlab = 'time',ylab='Population')
lines(20:50,Y[20:50],pch=19, col="red", type="b",lty=2,lwd=2)
legend(x = "bottomright", legend = c("X", "Y"),lty=c(1,2),pch=c(18,19),
       col = c("blue", "red"), inset = 0.02,lwd=2)

\(X\) and \(Y\) do not seem to be related, although we know that \(X\) and \(Y\) are coupled (it’s synthetic data!). Moreover, we can calculate the correlation coefficient between \(X\) and \(Y\):

fit<-lm(Y ~ X)
plot(X,Y,main='Correlation (X,Y)')
abline(0,1,lty=2)
#abline(fit$coefficients[1],fit$coefficients[2])
legend(x = "bottomleft", legend = paste('r =',round(cor(X,Y)*100)/100),
       inset = 0.02,col = 'black')

The results show that \(X\) and \(Y\) are not correlated, even though the data comes from a model with known causality (i.e. \(X\) is causing changes in \(Y\)). This is a clear example that “lack of correlation does not imply lack of causation”.

Cross Mapping²

How can we then extract the causality of \(X\) on \(Y\) from their dynamics? As we have seen in previous sections, a generic property of the reconstructed shadow manifold is that the states of \(X(t)\) on the shadow manifold (\(M_X\)) maps one-to-one onto states in the original attractor manifold \(M\), and local neighborhoods of \(M_X\) map onto local neighborhoods of \(M\).

It follows that for two variables \(X\) and \(Y\) that are dynamically coupled, local neighborhoods on their lagged reconstructions (\(M_X\) and \(M_Y\), respectively) will map to each other since \(X\) and \(Y\) are essentially alternative observations of the common original attractor manifold \(M\).

Convergent cross mapping (CCM) determines how well local neighborhoods on \(M_X\) correspond to local neighborhoods on \(M_Y\) and vice versa. To do so, we construct a manifold \(M_X\) from lags of the variable \(X\) and use it to predict the states of \(Y\) at the same time points. Similarly, a manifold \(M_Y\) is constructed from lags of the variable \(Y\) and used to predict the states of \(X\) at the same times.

To do so, we first need to obtain the optimal embedding dimension for both variables using the simplex function (see the tutorial about simplex projection for details).

options(warn = -1)
simplex_X<-simplex(X,silent=T)
plot(simplex_X$rho,type='o', ylab = "Forecast Skill (rho)",
     xlab="Embedding Dimension (E)")
E_star_X<-which.max(simplex_X$rho)
print(paste('E*(X) =',E_star_X))

## [1] "E*(X) = 2"

The optimal embedding dimension (\(E^*\)), that is, the one with larger prediction skill \(\rho\), is two.

simplex_Y<-simplex(Y,silent=T)
plot(simplex_Y$rho,type='o', ylab = "Forecast Skill (rho)",
     xlab="Embedding Dimension (E)")
E_star_Y<-which.max(simplex_Y$rho)
print(paste('E*(Y) =',E_star_Y))

## [1] "E*(Y) = 2"

\(E^*\) is also two for the shadow manifold of \(Y\).

Constructing the manifolds \(M_X\) and \(M_Y\)

Now that we have the optimal embedding dimensions, we can construct the shadow manifolds \(M_X\) and \(M_Y\) using the make_block function from the rEDM package. Use the command below to load this function into your R workspace (it will be included in rEDM itself in future releases of the package):

source("https://raw.githubusercontent.com/mathbio/edmTutorials/master/utilities/make_block.R")

Now let’s create the shadow manifolds for both \(X\) and \(Y\). Since \(E^* = 2\), we will use two dimensions to construct the shadow manifolds \(M_X\) and \(M_Y\). This means that \(M_X\) will be built based on \(X(t)\) and \(X(t+1)\), similarly \(M_Y\) is constructed from \(Y(t)\) and \(Y(t+1)\).

# max_lag is the optimal embedding dimension
Shadow_MXY<-make_block(XY,max_lag = 2)
Shadow_MX<-Shadow_MXY[,2:3]
Shadow_MY<-Shadow_MXY[,4:5]

head(Shadow_MXY)

##   time         X       X_1         Y       Y_1
## 1    1 0.1000000        NA 0.1000000        NA
## 2    2 0.3393000 0.1000000 0.3418900 0.1000000
## 3    3 0.8451417 0.3393000 0.8373481 0.3418900
## 4    4 0.4934071 0.8451417 0.3851034 0.8373481
## 5    5 0.9423361 0.4934071 0.8682788 0.3851034
## 6    6 0.2048571 0.9423361 0.2806179 0.8682788

The table above shows the \(X\) and \(Y\) time series and their respective lags \(X_1\) and \(Y_1\). The manifold \(M_X\) is thus composed of \(X\) and \(X_1\), and \(M_Y\) of \(Y\) and \(Y_1\).

Cross-mapping from \(M_X\) to \(M_Y\) (X_xmap_Y)

To better understand the cross mapping, let’s start by using \(X\) to predict one single value in \(Y\). Here, we are using the term “prediction”, but instead of predicting future values of \(X\) (as in the simplex tutorial), we will predict values of \(Y(t)\) using \(X(t)\) and vice-versa. This “cross-prediction” is performed between different variables but for the same point in time.

Suppose we want to predict the value \(Y(t=70)\):

predictor<-70
print(Y[predictor])

## [1] 0.8606044

We are going to use the nearest neighbours of \(X(t=70)\) within the manifold \(M_X\) to predict the value of \(Y(t=70)\). To do this, we first calculate a matrix of distances among all states of \(M_X\) and then we select the \(E^*+1\), in this example 2+1 = 3, nearest neighbours within the \(M_X\), creating the simplex_Mx.

dist.matrix_X <- as.matrix(dist(Shadow_MX, upper=TRUE))
neigb_X <- order(dist.matrix_X[predictor,])[2:4]
neigh_X_print<-c(neigb_X)
print(paste('simplex_Mx:',list(neigh_X_print)))

## [1] "simplex_Mx: c(31, 93, 130)"

We can also plot \(M_X\) with the predictor (red dot) and the respective neighbours simplex_Mx (blue dots). Zoom in to see all the dots.

The cross-mapping process starts by finding (mapping) the simplex_Mx onto \(M_Y\), creating the simplex_My. Note that simplex_My has the same indexes as simplex_Mx. The figure below shows \(M_Y\) and simplex_My (green dots).

The simplex_My will then be used to estimate the value of \(Y(t=70)\) in \(M_Y\), obtaining the predicted value \(\tilde{Y}\). We can expand this idea of predicting a single state of \(Y\) based on \(X\) to predict all the states of \(Y\) using \(X\). If we then do this prediction for every value of \(Y\), we will have a vector of the predicted states of \(Y\) (using \(X\)), besides the real observed states of \(Y\). We can then compare the predicted and observed:

lib <- c(1, NROW(Shadow_MXY))
block_lnlp_output_XY <- block_lnlp(Shadow_MXY, lib = lib, pred = lib, columns = c("X",
 "X_1"), target_column = "Y", stats_only = FALSE, first_column_time = TRUE)
observed_all_Y <- block_lnlp_output_XY$model_output[[1]]$obs
predicted_all_Y <- block_lnlp_output_XY$model_output[[1]]$pred
pred_obs_Y<-as.data.frame(cbind(predicted_all_Y,observed_all_Y)) #
colnames(pred_obs_Y)<-c('Predicted Y','Observed Y')
head(pred_obs_Y)

##   Predicted Y Observed Y
## 1   0.4975515  0.8373481
## 2   0.6043905  0.3851034
## 3   0.7493451  0.8682788
## 4   0.5365754  0.2806179
## 5   0.6601371  0.7601691
## 6   0.5649814  0.6072697

For the case of \(Y(t=70)\) we saw above, we find:

pred_obs_Y[(predictor-2),]

##    Predicted Y Observed Y
## 68   0.5652991  0.8606044

Note that the index of the predicted point is off by two. That’s because the first \(E^* = 2\) points cannot be predicted, so the prediction of \(Y(t=70)\) corresponds to the value at index 68 of the table.

Below is the plot for all predictions of \(Y\) (mapped from \(X\): X_xmap_Y) against the real observations, plus the Pearson’s correlation coefficient. The red dot represents the predicted and observed \(Y(t=70)\) used in the example above.

fit_YX<-lm(predicted_all_Y ~ observed_all_Y)
plot_range <- range(c(observed_all_Y, predicted_all_Y), na.rm = TRUE)
plot(observed_all_Y,predicted_all_Y, xlim = plot_range, ylim = plot_range, xlab = "Observed Y",
ylab = "Predicted Y")
#abline(fit_YX$coefficients[1],fit_YX$coefficients[2])
abline(0,1,lty=2)
legend(x = "bottomright", legend = paste('r =',round(cor(observed_all_Y, predicted_all_Y)*100)/100),inset = 0.02,col = 'black')
observed_pred_Y<-observed_all_Y[predictor-2]
predicted_pred_Y<-predicted_all_Y[predictor-2]
points(observed_pred_Y,predicted_pred_Y,col='red',pch=16,cex=1.2)

Cross-mapping from \(M_Y\) to \(M_X\) (Y_xmap_X)

Similarly to the previous mapping, we can now do the opposite: use \(Y\) to predict values of \(X\). Now, we want to predict the value of \(X(t=70)\):

print(X[predictor])

## [1] 0.9219819

We obtain the indexes of the \(E^*+1\) nearest neighbors of the given predictor. Since we are now cross mapping from \(M_Y\) to \(M_X\), the simplex_My is generated directly from the shadow manifold \(M_Y\). Again, we first create a matrix of distances among the states of \(M_Y\) and then we select the \(E^*+1\) nearest neighbors.

dist.matrix_Y <- as.matrix(dist(Shadow_MY, upper=TRUE))
neigb_Y <- order(dist.matrix_Y[predictor,])[2:4]
neigh_Y_print<-c(neigb_Y)
print(paste('simplex_My:',list(neigh_Y_print)))

## [1] "simplex_My: c(135, 5, 54)"

The following plot presents \(M_Y\) with the predictor (red dot) and respective simplex_My (blue dots).

Next, we map the simplex_My in \(M_X\), creating the simplex_Mx. Analogously, simplex_Mx has the same indexes as simplex_My. The figure below shows \(M_X\) and simplex_Mx (green dots).

The simplex_Mx will then be used to estimate the predicted value of the predictor in \(M_X\), obtaining the predicted value \(\tilde{X}(t)\).

lib<-lib <- c(1, NROW(Shadow_MXY))
block_lnlp_output_YX <- block_lnlp(Shadow_MXY, lib = lib, pred = lib, columns = c("Y", "Y_1"),
                                   target_column = "X", stats_only = FALSE, first_column_time = TRUE)
observed_all_X <- block_lnlp_output_YX$model_output[[1]]$obs
predicted_all_X <- block_lnlp_output_YX$model_output[[1]]$pred
pred_obs_X<-as.data.frame(cbind(predicted_all_X,observed_all_X))
colnames(pred_obs_X)<-c('Predicted X','Observed X')
head(pred_obs_X)

##   Predicted X Observed X
## 1   0.7513307  0.8451417
## 2   0.4587703  0.4934071
## 3   0.9306259  0.9423361
## 4   0.2179312  0.2048571
## 5   0.6259113  0.6140977
## 6   0.9040863  0.8934210

Below is the plot for all predictions of \(X\) (mapped from \(Y\): Y_xmap_X) against the real observations, plus the Pearson’s correlation coefficient. The red dot represents the predictor (at \(t=70\)) used in the example above.

fit_XY<-lm(predicted_all_X ~ observed_all_X)
plot_range <- range(c(observed_all_X, predicted_all_X), na.rm = TRUE)
plot(observed_all_X, predicted_all_X, xlim = plot_range, ylim = plot_range,
     xlab = "Observed X", ylab = "Predicted X")
#abline(fit_XY$coefficients[1],fit_XY$coefficients[2])
abline(0,1,lty=2)
legend(x = "bottomright", legend = paste('r =',round(cor(observed_all_X,predicted_all_X)*100)/100),
       inset = 0.02,col = 'black')
observed_pred_X<-observed_all_X[predictor-2]
predicted_pred_X<-predicted_all_X[predictor-2]
points(observed_pred_X,predicted_pred_X,col='red',pch=16,cex=1.2)

Note that the Pearson correlation of Y_xmap_X (r = 0.68) is different than the X_xmap_Y (r = -0.08). What is going on?

But what causes what?

The Pearson correlation of Y_xmap_X is much larger than the X_xmap_Y. When we used \(Y\) to estimate (map) \(X\), we obtained good predictions (r = 0.68). However, when we used \(X\) to estimate (map) \(Y\), the correlation between estimate and observed was quite low (r = -0.08). This means that the Y dynamics contains information about the dynamics of \(X\), and therefore we can use \(Y\) to estimate \(X\). This means that \(X\) somehow influences (causes!) \(Y\). Indeed, since all our data is synthetic, we know this is true: we created the variable \(X\) independently of \(Y\), and \(X\) was causing changes in \(Y\). So when \(X\) causes \(Y\), the cross mapping skill from \(M_Y\) to \(M_X\) (Y_xmap_X) will generally gives us good results (high correlations), but not X_xmap_Y. It is very important to notice that there is an inverse relation between cross mapping and causality: if \(X\) causes \(Y\) (\(X \rightarrow Y\)), \(Y\) cross map of \(X\) (Y_xmap_X) shows a positive (large) correlation.

Convergent Cross Mapping (CCM)

Convergence means that the estimates from the cross-mapping improve as we increase the length of the time-series. The length L is the sample size used to construct a library.

# cross map from X to Y
X_xmap_Y<- ccm(XY, E = 2, lib_column = "X", target_column = "Y",
               lib_sizes = seq(10, 130, by = 10), num_samples = 100, random_libs = TRUE,
               replace = TRUE)
# cross map from Y to X
Y_xmap_X<- ccm(XY, E = 2, lib_column = "Y", target_column = "X",
               lib_sizes = seq(10, 130, by = 10), num_samples = 100, random_libs = TRUE,
               replace = TRUE)

# mean values
X_xmap_Y_means <- ccm_means(X_xmap_Y)
Y_xmap_X_means <- ccm_means(Y_xmap_X)

# plot graphs
plot(X_xmap_Y_means$lib_size, pmax(0, X_xmap_Y_means$rho), type = "l", col = "red",
     main='Two Species', xlab = "Library Size (L)",
     ylab = "Cross Map Skill (Pearson rho)", ylim = c(0,1))
lines(Y_xmap_X_means$lib_size, pmax(0, Y_xmap_X_means$rho), col = "blue")
legend(x = "topleft", legend = c("X_xmap_Y", "Y_xmap_X"), col = c("red", "blue"),
       cex=1.1,lwd=2, inset = 0.02)

This means that the more data you have (i.e. the larger L), the more trajectories you have to infer the attractor, resulting thus in closer nearest neighbors and less estimation error (i.e. a higher correlation coefficient between estimated and observed). That is, if there is causation, we expect to see the convergence of the cross mapping (CCM).

Exercises

What happens when we invert the cause-effect relationship? Change the model so that the Y variable causes changes in X, but not the other way around.
What if there is no interaction between X and Y?
What about both variables interacting with each other (X causing X and Y causing X)?
Identifying causal links within systems is very important for effective policy and management recommendations on e.g. climate, ecosystem services, epidemiology and financial regulation. In the following exercise, use CCM to identify causality among sardine, anchovy, and sea surface temperature measured at Scripps Pier and Newport Pier, California.

data(sardine_anchovy_sst)
head(sardine_anchovy_sst)

##   year     anchovy   sardine     sio_sst     np_sst
## 1 1929 -0.00759874  1.770090 -0.35239300 -0.3478460
## 2 1930 -0.00960162 -1.151520  0.00114945  0.3287340
## 3 1931 -0.00844418 -1.420680  1.06822000  1.6102700
## 4 1932 -0.00835360  0.112222  0.53185700  1.2653400
## 5 1933 -0.00774971  1.515550 -0.55205800  0.0400459
## 6 1934 -0.00953117  2.989870 -1.14330000 -0.6913450

Learn more

Sugihara G, May R, Ye H, Hsieh C-h, Deyle E, Fogarty M, Munch S. 2012. Detecting Causality in Complex Ecosystems. Science 338:496–500.

Glossary

Attractor: the set of points to which the trajectories of a dynamical system converges over time.
Lagged variables plot: a plot of a time series as a function of the same time series lagged by multiples of a length of time \(\tau\). For instance, the axes of a three-dimensional lagged variable plot are the original time series \(X(t)\), the lagged time series \(X(t + \tau)\), and \(X(t + 2\tau)\).
State space reconstruction: the process of making a representation of a state space of interest that preserves some important properties. In the context of EDM, the reconsctructed state space corresponds to the lagged variables plot of a time series.
Shadow Manifold: the projection of the original manifold (which is an attractor in our case) into an Euclidean space. This space has a dimension (the embedding dimension) that ensures a one-to-one map to the original manifold.
Embedding dimension: for our purposes, the dimension of the Euclidean space used to build the shadow manifold. In the case of a state space reconstruction, the embedding dimension is the dimension of this space, that is, the number of axes of the lagged variables plot.

Taken from rEDM vignette section (“Causality Inference and Cross Mapping”).↩
This section is adapted from: Sugihara et al., Detecting Causality in Complex Ecosystems (Supplementary Materials), Science, 2012.↩

Convergent Cross Mapping (CCM)

School on Physics Applications in Biology, January 2018

Convergent Cross Mapping (CCM)

From Chaos to Chaos-ality

General idea 1

Synthetic Data

Cross Mapping2

Constructing the manifolds \(M_X\) and \(M_Y\)

Cross-mapping from \(M_X\) to \(M_Y\) (X_xmap_Y)

Cross-mapping from \(M_Y\) to \(M_X\) (Y_xmap_X)

But what causes what?

Convergent Cross Mapping (CCM)

Exercises

Learn more

Glossary

General idea ¹

Cross Mapping²