Introduction¶
Using machine and deep learning techniques to predict price movements is considered the holy grail of modern finance, with great focus from both individual professionals & hobbyists, to large multinational businesses, with mixed results. Given the stochastic nature of returns, it is arguable if a reliable algorithm can be found that is actually effective in the "wild".
Operational complexities such as timely access to information, the brokerage spread and transaction costs make the task difficult before even considering the more philosophical question on how perfect the markets are. This is ratcheted up a notch with the irrational movements of crypto markets, which arguably have no underlying inherent value. Of course, this depends on who you ask. What is true is that crypto markets are very volatile, offering opportunities for large gains for those brave enough to take the risk.
Applying deep learning to these markets, whilst offering an interesting academic problem to explore, are unlikely to offer additional insight and the below should not be used as a basis for any investment decisions.
A Note on the Project Workflow¶
The approach to this problem will follow the well trodden machine learning workflow as follows:
- Problem statements
- Data collection
- Exploratory data analysis
- Data cleaning
- Feature scaling and selection
- Model design and hypertuning
- Model evaluation
Given the subject matter is financial timeseries forecasting, the report will also include backtesting of the predictions versus a long only hold strategy, to understand if it achieves its objective of outperforming the market.
A full process diagram of how the problem was approached and the mdoel built is included in Appendix 1.
Problem Statement¶
The objective is to produce a model that can predict positive moves using Long Short-Term Memory (LSTM) networks in short term financial time series.
I have chosen Ethereum (ETH) as the ticker to analyse (technically a pair with USD). Crypto markets are notoriously volatile and it seems like a decent challenge to try and tease some insight out of the mess.
For this purpose, I will aim to predict an hourly positive return. Defining a positive return is discussed in more detail as part of the labels section. This will be a binary classification problem with 1 being the label for a positive move and 0 otherwise.
Whilst accuracy of the predictions will be an important metric, precision, recall and F1 will arguably be more important as measures of success. The precision of calculating the upward moves in particular would appear to be important as there is a realised financial loss from buying and selling at a loss in a high frequency setting. A thorough discussion of metrics is considered in the following pages.
Data Collection¶
Raw Data¶
Access to data is one of the biggest problems for deep learning problems. The amount of data required to train a (good) deep learning neural network is usually much more than is available outside of a professional setting. High frequency intraday data especially is difficult to come by, presumably due to the differences in exchanges, the cost of storage and how valuable the data can be.
After exploring Yahoo Finance (via the
yfinance
python package) and the Alpha Vantage API,
it became apparent that these sources did not have the amount of
data required or reliable data.
In the end, the data was sourced from https://www.cryptodatadownload.com/data/ from the data available from the Gemini exchange.
The data appears to be relatively complete on an arbitrary inspection and shows the meteoric rise of the crypto markets generally in 2021 and 2022 followed by the collapse in price due to the FTX and LUNA scandals. The raw data goes back to 2016, but truncated in the above chart for the reasons explained in the next few paragraphs.
The crypto market is notoriously emotion driven. Even glancing at social media or news outlets allows a person to gain a sense of how this is true. It follows then that some kind of sentiment regarding this emotional investing would potentially give some interesting insight into the problem statement. There is an interesting resource updated daily on alternative.me called the fear and greed index.
The index takes a weighted approach to a number of factors across 5 (formally, 6) data sources. A numerical value is assigned which falls into categories of:
- Extreme Fear
- Fear
- Neutral
- Greed
- Extreme Greed
The index is updated daily at 00:00 UTC.
fg_value | fg_value_classification | |
---|---|---|
timestamp | ||
2023-12-23 00:00:00+00:00 | 70 | greed |
2023-12-22 00:00:00+00:00 | 74 | greed |
2023-12-21 00:00:00+00:00 | 70 | greed |
2023-12-20 00:00:00+00:00 | 74 | greed |
2023-12-19 00:00:00+00:00 | 73 | greed |
The index could be a good indicator of sentiment in the crypto market as a whole. Crypto tokens do not have fundamental data so a traditional fundamental analysis cannot be undertaken. However, there are metrics associated with blockchains that can be accessed (such as transactions per second, blocks mined etc) that could be worth exploring in a further analysis but are outside the scope of this paper.
The index only began on 1 February 2018 so all price data before this date has been dropped and the the daily metric forward filled to the hourly data. This would reflect that the index would apply to the price movements throughout the day.
Feature Engineering¶
Feature engineering is the catch all term for using domain knowledge to generate insights from the raw dataset. Common data transformations for financial times series are known as technical analysis with associated literature that spans many volumes.
Using pandas-ta
, I have generated standard
technical indicators for the data based on high, low, open,
close and volume. I have also generated temporal data to
investigate if there is any kind of seasonality to returns. I
have used the scikit-learn
OneHotEncoder to encode
these values (and the FG classification) into binary variables.
Interestingly, the pandas-ta
module implements
TA-Lib and candlestick patterns. Reading candlesticks is a
classical version of technical analysis, originating from the
rice markets of Japan. Recognising what patterns lead to an
upward tick will be interesting to see.
Labelling the Data¶
Given the problem statement is to predict an hourly positive return. The 1 period return is calculated as follows, where $p$ is the closing price:
$$ r_t = ln(\frac{p_t}{p_{t-1}}) $$
A practical approach to predicting a positive return for these purposes would be any net return (i.e. after transaction costs).
Here are the fees from the Gemini exchange for reference. The taker fee at the lowest volume per month is 0.4%. To account for interest on margin, I will round this up to 0.5% as an estimate.
Therefore, a label of 1 will mean that that the upward return in the next hour will be greater than 0.05% and 0 otherwise. Mathematically:
$$ y_t = \begin{cases} 1 & \quad \text{if } r_{t+1} > 0.005\\ 0 & \quad \text{otherwise} \end{cases} $$
Exploratory Data Analysis¶
EDA is an important step in any machine learning workflow. Inititial analysis of the data after engineering the technical indicators indicated that some needed to be removed. This is discussed more in the cleaning section.
Some observations of the above:
-
A simple histogram of the the return and the fg_value indicates, as epected, that the daily returns are clustered around 0 but there are some significant outliers and high peaks.
-
The fear and greed value is distributed towards the lower end, possibly indicating that overall, fear dominated the crypto market in the time period in question.
Some observations from the above data:
- Return does not seem to be correlated with fg_value. However, higher fg_values (i.e. more greed in the index), seems to see the variance decrease (heteroscedasticity).
- There appears to be a slight negative correlation with fg_value and volume, indicating that less volume is traded during times of greed in the index.
- Return does not have significant outliers but volume does look like it would be a candidate for robust scaler.
label 0 40858 1 9809 Name: count, dtype: int64
The above does indicate that there is quite a severe class imbalance in the data that will need to be addressed at the model building stage otherwise the model will likely underperform due to bias.
Next, I have examined the correlation between features. Collinearity between features is present and there are a number of features that will need cleaning before they can be used in the model.
The heatmap of cleaned features shows clear collinearity between a number of features. Collinearity in ML problems affects performance and interpretability and so is generally best removed. There are multiple methods of doing this but in this paper, I have focused on the following two methods:
- Only retaining the first variable in a highly correlated pair
- Discarding variables with a variance inflation factor of greater than 5
The first method is self explanatory, the second is defined as:
$$ VIF_i = \frac{1}{1-R^{2}_{i}} $$
Where $R^{2}_{i}$ is the unadjusted coefficient of determination for regressing variable $i$ on all the remaining independent variables. A VIF equal to 1 indicates that the variables are not correlated, between 1 and 5, that there is moderate correlation between that variable and others and greater than 5 indicating high correlation with other variables.
Data Cleaning¶
As already alluded to in the proceeding sectons, the joining of
the two raw data sources and computation of the technical
indicators requires some cleaning. A number of steps have been
taken to deal with this in the
clean_scale.py
script. These are:
-
Drop all columns where
pandas.ta
has calculated NaN. - Drop columns where 20,000 datapoints (out of 65k+) are missing.
- Remove the leading NaN rows of data due to calculation of rolling amounts (simple moving averages etc.).
- Removal of all columns with no variance, indicating a single value. This is because no variance in the data indicates no informative information for the algorithm.
After this process, there are 324 features remaining in the dataset with hourly data from 26/02/2018 until 11/12/2023.
Feature Scaling¶
In order to get the best results from deep learning models, data generally needs to be scaled to aid in faster calculation of cost functions during gradient descent. There are various scaling techniques commonly used but this paper concentrates on two.
Min Max Scaler¶
The min max scaler rescales all features to within a range based on the following calculation:
$$ x_{scaled} = \frac{x_i - min(x)}{max(x) - min(x)} $$
This scaler is relatively sensitive to outliers but is generally good on many financial time series problems.
Robust Scaler¶
The Robust Scaler scales variables with significant outliers by using the quartiles of $x$ to scale the variables.
$$ x_{scaled} = \frac{x_i - Q_{1}(x)}{Q_{3}(x) - Q_{1}(x)} = \frac{x_i - Q_{1}(x)}{IQR(x)} $$
A significant outlier is calculated for the purposes of this paper as one that is 10 times the IQR. Appendix 2 lists each feature remianing after cleaning the data and the scaler applied to each.
Split Data into Train and Test¶
Before applying the chosen sclaing methods to each column, the data needs to be split into train and test data. This is because the scaling algorithm will be fit to the training data only and the test data scaled using metrics calculated on the training data only. This technique helps to avoid data leakage of the test data into the training dataset and also helps with regularization. Note that as this is time series data, the data should not be shuffled.
Once the train and test data are split, the train data is split again into a train and validation set test the mdoel during the training process.
All sets are then scaled in clean_scale.py
using
$x$ values calculated from the training set only to avoid any
data leakage.
Feature Selection¶
There are many (and varied) techniques to feature selection and feature engineering as a whole, necessitating experimentation to try and meet the main objective of an efficient model with good predictive powers.
To attempt to achieve this, I have split the problem into four stages:
- Removal of collinearity using one of the two techniques described above.
- Use of boruta as a feature selection algorithm.
- Dimension reduction using Uniform Manifold Approximation & Projection (UMAP), a relatively new and novel unsupervised learning algorithm.
-
Input the results of the above pipeline (or part there of)
into a baseline one layer LSTM model using
keras
and anlyse the results.
The best performing pipeline above will be chosen to test other model architectures.
Boruta¶
The Boruta algorithm is designed around a random forest classifier. It seeks to establish what features contribute to the overall model. It does this by duplicating and shuffling the dataset into "shadow features". The classifier (in this case, a random forest) is then trained on both sets of data. Feature importance is compared to the shadow features. If the feature has a greater importance than its shadow equivalent, then the feature is retained.
The algorithm is implemented in Python through the
boruta_py
package. in order to capture as much data
as possible, the parameter "perc" was set at 90 in line with the
documentation so as to avoid too "strict" of an interpretaion of
importance.
UMAP¶
Per the documentation, UMAP is a dimension reduction technique that can be used for visualisation similarly to t-distributed Stochastic Neighbor Embedding (t-SNE), but also for general non-linear dimension reduction. The mathamatics can be found in McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. According to the literature, UMAP offers significant performance improvements over other dimensionality techniques such as t-SNE and SOM. PCA was discounted in this analysis as it does not tend to work well on non-linear data.
The python implementation of UMAP takes a number of parameters but there are two main parameters that impact the clustering of the algoithm on the 2D plane.
Hyperparameter | Description | Value |
---|---|---|
n_neighbors | This parameter controls how UMAP balances local versus global structure in the data. It does this by constraining the size of the local neighborhood UMAP will look at when attempting to learn the manifold structure of the data. Lower values of n_neighbours force the algorithm to focus more on the local structure, potentailly losing some of the global structure and vice versa. | 10 |
min_dist | This parameter controls how tightly UMAP is allowed to pack points together is the 2D representation. A lower min_dist generally means that points will clump together more. The choice of min_dist will depend on the use case of the algorithm, with a lower value generally being more useful for clustering problems. | 0.1 (default) |
Appendix 3 explores the output of the algorithm on changing the above parameters and demonstrates why the above were chosen as a middle ground on local and global structure to use for dimension reduction of the dataset.
Baseline Model¶
In order to evaluate the various pipelines to understand the best set of features for this particular problem statement, a baseline deep learning model was used. This consisted on a single layer with 36 units and 'relu' activation, being:
$$ x^+ = \begin{cases} x & \quad \text{if } x > 0\\ 0 & \quad \text{otherwise} \end{cases} $$
The past 6 hours (i.e. current and preceeding 5) of data was used a sequence length on the basis that this seemed like a reasonable amount of time to use in predicting an up movement in 1 hour's time but also, it is a good middle point in terms of performance. The model structure is detailed in the below diargram.
The model input is a 3 dimentsional tensor in the form (batch, sequence length, features). The output of the dense layer is a $p$ prediction that the output will be 1 such that:
$$ \text{prediction} = \begin{cases} 1 & \quad \text{if } p > 0.5\\ 0 & \quad \text{otherwise} \end{cases} $$
The optimizer used is Adam (see Kingma et al. 2014) and the loss function is binary cross entropy, being the most appropriate loss function for evaluating binary classification problems.
Before training, the class imbalance was dealt with by assigning
a weight to each class. The weights are calculated by taking 1
divided by the count of the class multiplied by the total length
of the array divided by 2. The resultant python dictionary can
be used in the keras
Model class as the
"class_weights" parameter.
An early stopping callback was used in the model monitoring the validation loss (i.e. the model will cease running 10 epochs after the validation loss has ceased decreasing).
Experiment Results¶
max_binary_accuracy | max_binary_accuracy_epoch | max_precision | max_precision_epoch | max_recall | max_recall_epoch | max_f1 | max_f1_epoch | |
---|---|---|---|---|---|---|---|---|
run_name | ||||||||
run_pairwisecorr__boruta_01-14-2024-14:16:04 | 0.698578 | 13 | 0.392012 | 13 | 0.885132 | 3 | 0.484584 | 17 |
run_boruta_01-14-2024-14:40:32 | 0.581991 | 4 | 0.329438 | 4 | 0.906438 | 1 | 0.468319 | 4 |
run_vif_01-14-2024-14:23:39 | 0.641601 | 28 | 0.342770 | 28 | 0.928208 | 0 | 0.466315 | 16 |
run_vif__boruta_01-14-2024-14:31:03 | 0.583992 | 34 | 0.324927 | 34 | 0.858268 | 4 | 0.457744 | 2 |
run_vif__umap_01-14-2024-14:50:45 | 0.656767 | 11 | 0.345592 | 11 | 0.781843 | 0 | 0.435312 | 9 |
run_pairwisecorr_01-14-2024-14:15:30 | 0.562085 | 0 | 0.274124 | 0 | 0.930523 | 9 | 0.403786 | 16 |
run_all_01-14-2024-14:14:58 | 0.706477 | 2 | 0.353408 | 2 | 0.963872 | 5 | 0.400226 | 18 |
run_boruta__umap_01-14-2024-14:45:33 | 0.674882 | 7 | 0.332054 | 9 | 0.907365 | 0 | 0.391881 | 1 |
run_pairwisecorr__boruta__umap_01-14-2024-14:19:45 | 0.718483 | 10 | 0.374512 | 10 | 0.358962 | 2 | 0.365049 | 2 |
run_umap_01-14-2024-14:50:18 | 0.688046 | 10 | 0.260012 | 10 | 0.554887 | 1 | 0.331351 | 1 |
An F1 score has been calculated at each epoch using the precision and recall. The F1 metric attemps to evaluate the model on its class-wise performance and is the harmonic mean of the precision and recall scores. Mathematically:
$Precision = \frac{TP}{TP + FP}$
$Recall = \frac{TP}{TP + FN}$
$Precision = 2 \times \frac{Precision \times Recall}{Precision + Recall}$
where
$TP = \text{True Positive}$
$FP = \text{False Positive}$
$FN = \text{False Negative}$
For a given class:
- A high precision and a high recall - The class has been well managed by the model
- High precision and low recall - The class is not well detected but when it is, the model is very reliable.
- Low precision and high recall - The class is well detected, but also includes observations of other classes.
- Low precision and low recall - the class has not been handled well at all
The use of F1 is due to the fact that precision and recall are often antagonistic. The F1 score measures both precision and recall in one measure. The higher the F1 score, the better the overall model in classifying both classes.
The F1 score has been used as the principal metric on which to evaluate the model as it offers a good "all encompassing" metric where neither precision nor recall are obviously more important.
Pipeline Selection¶
From the above results, the preprocessing that gives the highest F1 is pairwise reduction in correlation followed by using Boruta to assess the remaining features. This is the preprocessing pipeline I have adopted for the remainder of this report.
Deep Learning Model¶
A Sensible Baseline to Beat¶
In order to judge if the model is achieving its stated objective, a sensible baseline to beat should be established. Given the class counts above, predicting a 0 each time would result in an accuracy for 0.8064 (if not any chance of a profit as an investor would never take any risk). This would result in a precision and recall on class 1 of 0, which should be beatable....
If we can approach this sensible baseline, then a backtest would determine if a strategy based on these signals would result in any profit, over and above a long hold.
Tested Models¶
In line with the requirments of the project, LSTM models have been used to attempt the classification problem. An LSTM layer is a type of recurrant neaural network that enable the learning of long term dependecneis. They were first proposed in Hochreiter & Schmidhuber (1997) and refined since then. The model structure is included below.
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
The key to the LSTM's is the cell state which is represented by the horizontel line in the above diagram. This line containes the prior data and passes a number of gates that have the ability to add or remove information. The models are widely used in a number of machines learning problems.
For the basis of this paper, 1, 2 and 3 layer LSTM's were considered when evaluating the predictive power of the model. The additional layers add complexity and computing time to the model and this needs to be weighed with the predictive power.
Dropout layers, whilst useful for regualrization of other ML problems, are known to hinder learning in RNNs. In line with Gal & Ghahramani (2016), the recurrent dropout parameter was used to introduce dropout into the models. This parameter uses the same dropout mask on each unit. Using the same dropout msak at every timestep allows the network to properly propagate its learning error throughout time whereas a temporally random dropout would disrupt this error (Deep Learning with Python, Chollet).
Each of these models was tested using 4 lookback periods, being 1, 6, 12 and 24 hours. The results of the experiments are presented below.
Experiment Results¶
max_binary_accuracy | max_binary_accuracy_epoch | max_precision | max_precision_epoch | max_recall | max_recall_epoch | max_f1 | max_f1_epoch | |
---|---|---|---|---|---|---|---|---|
run_name | ||||||||
baseline_24_hour | 0.712251 | 7 | 0.405663 | 7 | 0.862117 | 5 | 0.493919 | 11 |
two_layer_dropout_24_hour | 0.658436 | 18 | 0.372197 | 18 | 0.912256 | 1 | 0.493506 | 18 |
two_layer_12_hour | 0.686268 | 22 | 0.388934 | 22 | 0.906858 | 1 | 0.490676 | 22 |
three_layer_dropout_12_hour | 0.623353 | 11 | 0.353416 | 11 | 0.910565 | 9 | 0.488552 | 11 |
baseline_dropout_24_hour | 0.688720 | 17 | 0.389321 | 17 | 0.896472 | 24 | 0.486957 | 17 |
two_layer_dropout_6_hour | 0.642654 | 30 | 0.359836 | 30 | 0.935618 | 0 | 0.482853 | 30 |
three_layer_12_hour | 0.641058 | 25 | 0.358053 | 25 | 0.933735 | 1 | 0.480317 | 25 |
baseline_12_hour | 0.616503 | 16 | 0.345375 | 16 | 0.886932 | 7 | 0.476177 | 16 |
baseline_dropout_6_hour | 0.612428 | 28 | 0.342449 | 28 | 0.900417 | 0 | 0.475862 | 20 |
three_layer_dropout_6_hour | 0.609689 | 38 | 0.341398 | 38 | 0.942103 | 1 | 0.473280 | 38 |
two_layer_6_hour | 0.638968 | 6 | 0.353903 | 6 | 0.908754 | 12 | 0.472778 | 6 |
two_layer_24_hour | 0.645141 | 28 | 0.354513 | 28 | 0.902971 | 0 | 0.470426 | 29 |
two_layer_dropout_12_hour | 0.580883 | 23 | 0.327153 | 31 | 0.900371 | 1 | 0.468639 | 31 |
three_layer_6_hour | 0.583886 | 10 | 0.329333 | 10 | 0.911070 | 0 | 0.466730 | 10 |
baseline_dropout_12_hour | 0.579408 | 25 | 0.327369 | 25 | 0.926784 | 6 | 0.465515 | 25 |
baseline_6_hour | 0.600527 | 21 | 0.332238 | 21 | 0.902733 | 0 | 0.463391 | 17 |
baseline_1_hour | 0.626526 | 23 | 0.340106 | 23 | 0.874711 | 3 | 0.461673 | 25 |
baseline_dropout_1_hour | 0.629579 | 24 | 0.342326 | 24 | 0.874711 | 3 | 0.461654 | 18 |
three_layer_dropout_1_hour | 0.565895 | 5 | 0.321109 | 5 | 0.926491 | 0 | 0.460492 | 5 |
three_layer_1_hour | 0.565579 | 5 | 0.320999 | 5 | 0.926491 | 0 | 0.460452 | 5 |
two_layer_dropout_1_hour | 0.515579 | 0 | 0.299655 | 0 | 0.917245 | 3 | 0.443265 | 8 |
two_layer_1_hour | 0.515579 | 0 | 0.299655 | 0 | 0.917245 | 3 | 0.442980 | 8 |
three_layer_24_hour | 0.564419 | 2 | 0.281820 | 4 | 0.972609 | 9 | 0.410888 | 14 |
three_layer_dropout_24_hour | 0.475150 | 2 | 0.250718 | 6 | 0.991643 | 19 | 0.387369 | 6 |
The above indicates that the baseline model with a look back period of 24 hours performed the best in terms of accuracy and the F1 statistic (on the validation data). This model structure will be selected as the final model for hyperparameter tuning.
Hypertuning Strategy¶
Having determined the model structure that results in the highest F1 score, the hyperparameters of the model are tuned in order to try and improve performance. Using Keras Tuner for this task, the first decision is the choice in tuning algorithm.
Tuner | Description |
---|---|
RandomSearch | Chooses hyperparameters at random. Computationally expensive and completely random (as the name suggests) in finding a best set of hyperparameters. |
GridSearch | Similar to the grid search in Scikit Learn, this tuner attempts all possible combination of hyperparameters to find the best. Again, computationally (very) expensive. |
BaysianOptimization | Using a Baysian approach to optimization with a Gaussian distribution. |
Hyperband | The Keras Tuner implementation of Hyberband. This is a novel bandit based approach to optimization based on the paper by Li, Lisha, and Kevin Jamieson, "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. |
According to the algorithm's authors, Hyberband offers performance advantages over all the other available tuning algorithms. This is the tuner I have used to tune the model.
The parameters to be tuned are:
| Parameter | Description | |---------------------|--------------------------------------------------------------------------------------------------------------------------------| | Units is LSTM layer | The number of units for the LSTM layer | | Activation function | This is a choice of either relu, elu, tanh, sigmoid, selu | | Learning rate | The learning rate of the gradient descent. A float between 0.0005 and 0.01 | | Beta 1 | The exponential decay rate for the first moment estimates. A float between 0.5 and 0.99 | | Beta 2 | The exponential decay rate for the second-moment estimates. Generally should be set close to 1. A float between 0.5 and 0.9999 |9999 |
Given the validation loss of the model seems to minimise at around epoch 7, the max epochs parameter for the tuner was set at 15. This would allow enough expansion for testing hyperparameters without being overly wastful on resources.
After the tuning process, trial 28 was chosen and the final parameters were:
Parameter | Value |
---|---|
Units | 400 |
Activation | tanh |
Learning rate | 0.00058215 |
Both betas were tuned but lost in the output when generating the model. All the results of the hypertuning can be found in Appendix 4.
Final Model¶
Model Evaluation¶
To evaluate the model, it is trained on the test data which has
been held back. The scaled data is reimported and put through
the pairwise correlation and boruta pipeline. The pipeline is
converted to a tensorflow
Dataset using the
timeseries utility function, giving the same dataset as used in
the experimentations designed above.
retained_feature | |
---|---|
0 | open |
1 | high |
2 | ABER_XG_5_15 |
3 | ACCBL_20 |
4 | ACCBU_20 |
5 | ADX_14 |
6 | AGj_13_8_5 |
7 | ALPHAT_14_1_50 |
8 | OBV_min_2 |
9 | OBV_max_2 |
10 | AROONOSC_14 |
11 | BBU_5_2.0 |
12 | BIAS_SMA_26 |
13 | BOP |
14 | PIVOTS_TRAD_D_R1 |
15 | UO_7_14_28 |
It is interesting to note when examining the retained features that the fear and greed index data and seasonality data has been dropped. This indicates that these features may need some additional transformations in order to have a significant impact on the model or that they are not as useful in predicting short term movements as other features.
It is also interesting to note that the candlestick patterns are not included either. There goes hundreds of years of Japanese know how!
The remaining features seem to be a good mix of raw data, momentum indicators and trend. The function names can be found at https://ta-lib.org/functions/ and there is volumes of information on their definition and usage available online.
precision recall f1-score support 0 0.94 0.81 0.87 11169 1 0.31 0.64 0.42 1475 accuracy 0.79 12644 macro avg 0.63 0.73 0.64 12644 weighted avg 0.87 0.79 0.82 12644
The above indicates quite encouraging results for the model when it is tested on test data. The model predicts the 0 class well but struggles more with class 1. For the use case for this model whereby a person would use these signals to enter or exit a trade, it could be useful as any incorrect position would quickly be exited with a correct prediciton of class 0.
The weighted avereage F1 score also indicates a model with decent predictive power. The results are quite encouraging!
In constructing the confusion matrix, the values have been normalised due to the class imbalance. This generally gives a better understanding of how the model is predicting each class in these cases.
The results are promising, indicating that the model has a relatively small proportion of false positives and false negatives. As false positives are more likely to result in actual loss of investment value, this is encouraging. The model appears to perform relatively well, all things considered.
$\text{True Positive Rate} = Recall$
$\text{False Positive Rate} = \frac{False Postive}{Trune Negative + False Positive}$
The red line on the above digram represents the situation where the true positive rate is equal to the false positve rate. Points above this line indicate where the proportion of of correctly classified points belonging to the positive class is greater than the proportion of incorrectly classified points belonging to the negative class.
A perfect model that correctly classified everything, would have the elbow point at the coordinates (0, 1). The area under the curve (AUC) provides an aggregate measure of performance across all possible classification thresholds. An AUC above 0.5 indicates that the model has some predictive power of a random choice. In short, the greater the gradient of the ROC curve, the better the predictive power of the model.
A 0.73 is not bad and indicates that the model has moderate predictive power which will be tested more in the backtesting section.
All in all, the model appears to be well regularized and performs well on the test data. Finally, a backtest needs to be conducted to simulate how the predictions perform as signals for trading decisions.
Backtesting¶
Approaches to backtesting can vary but I have implemented a long only strategy. This will be conpared to a long hold from the beginning of the period.
A few assumption have been made when considering the strategy:
- That the minimum holding period will be for the duration of the timestep (i.e. 1 hour in this case)
- That a long position can be taken in the security at the close price from the previous step, thus locking in the maximum return
- That the investor operating the strategy can borrow for free or has access to cash to continue taking positions when all funds are lost
These assumptions are not appropriate is a real market situation and should be considered further should this strategy need to be used in production.
Wow! I'm going to be a billionaire!
But seriously, the backtesting does appear to indicate that the model is very good at generating a consistent return. However, significantly more testing would need to be undertaken before using the model 'in the wild' with real time market data, order books and real money. The above indicates that there may be some data leakage in the model that should be investigated further as part of any further analysis
Arguably, the reason for the good backtest is that downward movements are predicted very well (0.94 precision) meaning any mistakes are quickly rectified. Combined with how generally small hourly returns are, any incorrect position would quickly be closed out. Transaction costs would need to be build into any production model to better model if the strategy would work.
The first 100 results of the strategy can be found in Appendix 5.
Conclusions¶
The final model failed to beat the sensible baseline by 1.64%. However, the confusion matrix and AUC metrics indicate the model performs well in predicting true positive and true negative, indicating a goog level of performance. Also, it did do relatively well when real investment actions were simulated (subject to further testing) so the model could warant further analysis, incorporating more real world trading conditions.
It is interesting to note that the fear and greed index ultimately had no impact on the analysis, potentially indicating that there is value in ignoring the noise and sticking to a dispassionate strategy.
Thank you for reading.
Appendices¶
Appendix 1 - Process Diagram¶
Appendix 2 - Features and Scalers¶
feature | |
---|---|
scaler | |
RobustScaler | ABER_ATR_5_15 |
MinMaxScaler | ABER_SG_5_15 |
MinMaxScaler | ABER_XG_5_15 |
MinMaxScaler | ABER_ZG_5_15 |
MinMaxScaler | ACCBL_20 |
MinMaxScaler | ACCBM_20 |
MinMaxScaler | ACCBU_20 |
MinMaxScaler | AD |
RobustScaler | ADOSC_3_10 |
MinMaxScaler | ADXR_14_2 |
MinMaxScaler | ADX_14 |
MinMaxScaler | AGj_13_8_5 |
MinMaxScaler | AGl_13_8_5 |
MinMaxScaler | AGt_13_8_5 |
MinMaxScaler | ALMA_9_6.0_0.85 |
MinMaxScaler | ALPHAT_14_1_50 |
MinMaxScaler | ALPHATl_14_1_50_2 |
MinMaxScaler | AMATe_LR_8_21_2 |
MinMaxScaler | AMATe_SR_8_21_2 |
MinMaxScaler | AOBV_LR_2 |
MinMaxScaler | AOBV_SR_2 |
RobustScaler | AO_5_34 |
RobustScaler | APO_12_26 |
MinMaxScaler | AROOND_14 |
MinMaxScaler | AROONOSC_14 |
MinMaxScaler | AROONU_14 |
MinMaxScaler | AR_26 |
MinMaxScaler | ATRTSe_14_20_3.0 |
RobustScaler | ATRr_14 |
RobustScaler | BBB_5_2.0 |
MinMaxScaler | BBL_5_2.0 |
MinMaxScaler | BBM_5_2.0 |
RobustScaler | BBP_5_2.0 |
MinMaxScaler | BBU_5_2.0 |
RobustScaler | BEARP_13 |
RobustScaler | BIAS_SMA_26 |
MinMaxScaler | BOP |
MinMaxScaler | BR_26 |
RobustScaler | BULLP_13 |
MinMaxScaler | CCI_14_0.015 |
MinMaxScaler | CDL_3WHITESOLDIERS |
MinMaxScaler | CDL_ADVANCEBLOCK |
MinMaxScaler | CDL_BELTHOLD |
MinMaxScaler | CDL_CLOSINGMARUBOZU |
MinMaxScaler | CDL_DOJI_10_0.1 |
MinMaxScaler | CDL_DRAGONFLYDOJI |
MinMaxScaler | CDL_GRAVESTONEDOJI |
MinMaxScaler | CDL_HAMMER |
MinMaxScaler | CDL_HANGINGMAN |
MinMaxScaler | CDL_HIGHWAVE |
MinMaxScaler | CDL_HIKKAKE |
MinMaxScaler | CDL_HIKKAKEMOD |
MinMaxScaler | CDL_IDENTICAL3CROWS |
MinMaxScaler | CDL_INSIDE |
MinMaxScaler | CDL_LONGLEGGEDDOJI |
MinMaxScaler | CDL_LONGLINE |
MinMaxScaler | CDL_MARUBOZU |
MinMaxScaler | CDL_MATCHINGLOW |
MinMaxScaler | CDL_RICKSHAWMAN |
MinMaxScaler | CDL_SEPARATINGLINES |
MinMaxScaler | CDL_SHORTLINE |
MinMaxScaler | CDL_SPINNINGTOP |
MinMaxScaler | CDL_STALLEDPATTERN |
MinMaxScaler | CDL_TAKURI |
RobustScaler | CFO_9 |
RobustScaler | CG_10 |
MinMaxScaler | CHDLREXTd_22_22_14_2.0 |
MinMaxScaler | CHDLREXTl_22_22_14_2.0 |
MinMaxScaler | CHDLREXTs_22_22_14_2.0 |
MinMaxScaler | CHOP_14_1_100.0 |
MinMaxScaler | CKSPl_10_3_20 |
MinMaxScaler | CKSPs_10_3_20 |
MinMaxScaler | CMF_20 |
MinMaxScaler | CMO_14 |
MinMaxScaler | COPC_11_14_10 |
MinMaxScaler | CRSI_3_2_100 |
MinMaxScaler | CTI_12 |
RobustScaler | CUBE_3.0_-1 |
RobustScaler | CUBEs_3.0_-1 |
MinMaxScaler | DCL_20_20 |
MinMaxScaler | DCM_20_20 |
MinMaxScaler | DCU_20_20 |
MinMaxScaler | DEC_1 |
MinMaxScaler | DEMA_10 |
RobustScaler | DMN_14 |
RobustScaler | DMP_14 |
RobustScaler | DPO_20 |
MinMaxScaler | D_9_3 |
MinMaxScaler | EBSW_40_10 |
RobustScaler | EFI_13 |
MinMaxScaler | EMA_10 |
RobustScaler | ENTP_10 |
MinMaxScaler | ER_10 |
MinMaxScaler | FAMA_0.5_0.05 |
MinMaxScaler | FISHERT_9_1 |
MinMaxScaler | FISHERTs_9_1 |
MinMaxScaler | FWMA_10 |
MinMaxScaler | HA_close |
MinMaxScaler | HA_high |
MinMaxScaler | HA_low |
MinMaxScaler | HA_open |
MinMaxScaler | HILO_13_21 |
MinMaxScaler | HL2 |
MinMaxScaler | HLC3 |
MinMaxScaler | HMA_10 |
MinMaxScaler | HWL_1 |
MinMaxScaler | HWMA_0.2_0.1_0.1 |
MinMaxScaler | HWM_1 |
MinMaxScaler | HWU_1 |
MinMaxScaler | ICS_26 |
MinMaxScaler | IKS_26 |
MinMaxScaler | INC_1 |
MinMaxScaler | INERTIA_20_14 |
MinMaxScaler | INVFISHER_1.0 |
MinMaxScaler | INVFISHERs_1.0 |
MinMaxScaler | ISA_9 |
MinMaxScaler | ISB_26 |
MinMaxScaler | ITS_9 |
MinMaxScaler | JMA_7_0.0 |
MinMaxScaler | J_9_3 |
MinMaxScaler | KAMA_10_2_30 |
MinMaxScaler | KCBe_20_2 |
MinMaxScaler | KCLe_20_2 |
MinMaxScaler | KCUe_20_2 |
MinMaxScaler | KST_10_15_20_30_10_10_10_15 |
MinMaxScaler | KSTs_9 |
RobustScaler | KURT_30 |
RobustScaler | KVO_34_55_13 |
RobustScaler | KVOs_34_55_13 |
MinMaxScaler | K_9_3 |
MinMaxScaler | LDECAY_1 |
MinMaxScaler | LINREG_14 |
RobustScaler | LOGRET_1 |
RobustScaler | MACD_12_26_9 |
RobustScaler | MACDh_12_26_9 |
RobustScaler | MACDs_12_26_9 |
RobustScaler | MAD_30 |
MinMaxScaler | MAMA_0.5_0.05 |
MinMaxScaler | MASSI_9_25 |
MinMaxScaler | MCGD_10 |
MinMaxScaler | MEDIAN_30 |
MinMaxScaler | MFI_14 |
MinMaxScaler | MIDPOINT_2 |
MinMaxScaler | MIDPRICE_2 |
RobustScaler | MOM_10 |
RobustScaler | NATR_14 |
MinMaxScaler | NVI_1 |
MinMaxScaler | OBV |
MinMaxScaler | OBV_max_2 |
MinMaxScaler | OBV_min_2 |
MinMaxScaler | OBVe_12 |
MinMaxScaler | OBVe_4 |
MinMaxScaler | OHLC4 |
RobustScaler | PCTRET_1 |
RobustScaler | PDIST |
MinMaxScaler | PGO_14 |
MinMaxScaler | PIVOTS_TRAD_D_P |
MinMaxScaler | PIVOTS_TRAD_D_R1 |
MinMaxScaler | PIVOTS_TRAD_D_R2 |
MinMaxScaler | PIVOTS_TRAD_D_R3 |
MinMaxScaler | PIVOTS_TRAD_D_R4 |
MinMaxScaler | PIVOTS_TRAD_D_S1 |
MinMaxScaler | PIVOTS_TRAD_D_S2 |
MinMaxScaler | PIVOTS_TRAD_D_S3 |
MinMaxScaler | PIVOTS_TRAD_D_S4 |
RobustScaler | PPO_12_26_9 |
RobustScaler | PPOh_12_26_9 |
RobustScaler | PPOs_12_26_9 |
MinMaxScaler | PSARaf_0.02_0.2 |
MinMaxScaler | PSARr_0.02_0.2 |
MinMaxScaler | PSL_12 |
MinMaxScaler | PVI_1 |
RobustScaler | PVOL |
MinMaxScaler | PVO_12_26_9 |
MinMaxScaler | PVOh_12_26_9 |
MinMaxScaler | PVOs_12_26_9 |
MinMaxScaler | PVR |
MinMaxScaler | PVT |
MinMaxScaler | PWMA_10 |
MinMaxScaler | QQE_14_5_4.236 |
MinMaxScaler | QQE_14_5_4.236_RSIMA |
RobustScaler | QS_10 |
MinMaxScaler | QTL_30_0.5 |
MinMaxScaler | REFLEX_20_20_0.04 |
MinMaxScaler | REMAP_0.0_100.0_-1.0_1.0 |
MinMaxScaler | RMA_10 |
RobustScaler | ROC_10 |
MinMaxScaler | RSI_14 |
MinMaxScaler | RSX_14 |
MinMaxScaler | RVGI_14_4 |
MinMaxScaler | RVGIs_14_4 |
MinMaxScaler | RVI_14 |
MinMaxScaler | RWIh_14 |
MinMaxScaler | RWIl_14 |
MinMaxScaler | SINWMA_14 |
MinMaxScaler | SKEW_30 |
RobustScaler | SLOPE_1 |
MinMaxScaler | SMA_10 |
MinMaxScaler | SMI_5_20_5_1.0 |
MinMaxScaler | SMIo_5_20_5_1.0 |
MinMaxScaler | SMIs_5_20_5_1.0 |
MinMaxScaler | SMMA_7 |
RobustScaler | SQZPRO_20_2.0_20_2.0_1.5_1.0 |
MinMaxScaler | SQZPRO_OFF |
MinMaxScaler | SQZPRO_ON_NARROW |
MinMaxScaler | SQZPRO_ON_NORMAL |
MinMaxScaler | SQZPRO_ON_WIDE |
RobustScaler | SQZ_20_2.0_20_1.5 |
MinMaxScaler | SQZ_OFF |
MinMaxScaler | SQZ_ON |
MinMaxScaler | SSF3_20 |
MinMaxScaler | SSF_20 |
MinMaxScaler | STC_10_12_26_0.5 |
RobustScaler | STCmacd_10_12_26_0.5 |
MinMaxScaler | STCstoch_10_12_26_0.5 |
RobustScaler | STDEV_30 |
MinMaxScaler | STOCHFd_14_3 |
MinMaxScaler | STOCHFk_14_3 |
MinMaxScaler | STOCHRSId_14_14_3_3 |
MinMaxScaler | STOCHRSIk_14_14_3_3 |
MinMaxScaler | STOCHd_14_3_3 |
MinMaxScaler | STOCHh_14_3_3 |
MinMaxScaler | STOCHk_14_3_3 |
MinMaxScaler | SUPERT_7_3.0 |
MinMaxScaler | SUPERTd_7_3.0 |
MinMaxScaler | SWMA_10 |
MinMaxScaler | T3_10_0.7 |
MinMaxScaler | TEMA_10 |
RobustScaler | THERMO_20_2_0.5 |
MinMaxScaler | THERMOl_20_2_0.5 |
RobustScaler | THERMOma_20_2_0.5 |
MinMaxScaler | THERMOs_20_2_0.5 |
MinMaxScaler | TMO_14_5_3 |
MinMaxScaler | TMOs_14_5_3 |
MinMaxScaler | TOS_STDEVALL_LR |
MinMaxScaler | TOS_STDEVALL_L_1 |
MinMaxScaler | TOS_STDEVALL_L_2 |
MinMaxScaler | TOS_STDEVALL_L_3 |
MinMaxScaler | TOS_STDEVALL_U_1 |
MinMaxScaler | TOS_STDEVALL_U_2 |
MinMaxScaler | TOS_STDEVALL_U_3 |
MinMaxScaler | TRENDFLEX_20_20_0.04 |
MinMaxScaler | TRIMA_10 |
MinMaxScaler | TRIX_30_9 |
MinMaxScaler | TRIXs_30_9 |
RobustScaler | TRUERANGE_1 |
MinMaxScaler | TSI_13_25_13 |
MinMaxScaler | TSIs_13_25_13 |
RobustScaler | TSV_18_10 |
RobustScaler | TSVr_18_10 |
RobustScaler | TSVs_18_10 |
MinMaxScaler | TTM_TRND_6 |
RobustScaler | UI_14 |
MinMaxScaler | UO_7_14_28 |
RobustScaler | VAR_30 |
MinMaxScaler | VHF_28 |
RobustScaler | VHM_610 |
MinMaxScaler | VTXM_14 |
MinMaxScaler | VTXP_14 |
MinMaxScaler | VWAP_D |
MinMaxScaler | VWMA_10 |
MinMaxScaler | WCP |
MinMaxScaler | WILLR_14 |
MinMaxScaler | WMA_10 |
MinMaxScaler | ZL_EMA_10 |
MinMaxScaler | ZS_30 |
MinMaxScaler | close |
MinMaxScaler | close_Z_30_1 |
MinMaxScaler | day_of_week_0 |
MinMaxScaler | day_of_week_1 |
MinMaxScaler | day_of_week_2 |
MinMaxScaler | day_of_week_3 |
MinMaxScaler | day_of_week_4 |
MinMaxScaler | day_of_week_5 |
MinMaxScaler | day_of_week_6 |
MinMaxScaler | fg_value |
MinMaxScaler | fg_value_classification_extreme fear |
MinMaxScaler | fg_value_classification_extreme greed |
MinMaxScaler | fg_value_classification_fear |
MinMaxScaler | fg_value_classification_greed |
MinMaxScaler | fg_value_classification_neutral |
MinMaxScaler | high |
MinMaxScaler | high_Z_30_1 |
MinMaxScaler | hour_0 |
MinMaxScaler | hour_1 |
MinMaxScaler | hour_10 |
MinMaxScaler | hour_11 |
MinMaxScaler | hour_12 |
MinMaxScaler | hour_13 |
MinMaxScaler | hour_14 |
MinMaxScaler | hour_15 |
MinMaxScaler | hour_16 |
MinMaxScaler | hour_17 |
MinMaxScaler | hour_18 |
MinMaxScaler | hour_19 |
MinMaxScaler | hour_2 |
MinMaxScaler | hour_20 |
MinMaxScaler | hour_21 |
MinMaxScaler | hour_22 |
MinMaxScaler | hour_23 |
MinMaxScaler | hour_3 |
MinMaxScaler | hour_4 |
MinMaxScaler | hour_5 |
MinMaxScaler | hour_6 |
MinMaxScaler | hour_7 |
MinMaxScaler | hour_8 |
MinMaxScaler | hour_9 |
MinMaxScaler | low |
MinMaxScaler | low_Z_30_1 |
MinMaxScaler | month_1 |
MinMaxScaler | month_10 |
MinMaxScaler | month_11 |
MinMaxScaler | month_12 |
MinMaxScaler | month_2 |
MinMaxScaler | month_3 |
MinMaxScaler | month_4 |
MinMaxScaler | month_5 |
MinMaxScaler | month_6 |
MinMaxScaler | month_7 |
MinMaxScaler | month_8 |
MinMaxScaler | month_9 |
MinMaxScaler | open |
MinMaxScaler | open_Z_30_1 |
RobustScaler | volume |
Appendix 3 - UMAP Parameters¶
Appendix 4 - Hypertuning Results¶
max_binary_accuracy | max_binary_accuracy_epoch | max_precision | max_precision_epoch | max_recall | max_recall_epoch | max_f1 | max_f1_epoch | |
---|---|---|---|---|---|---|---|---|
run_name | ||||||||
baseline_24_hour_0028 | 0.680384 | 12 | 0.387734 | 12 | 0.863510 | 4 | 0.499422 | 12 |
baseline_24_hour_0022 | 0.729239 | 4 | 0.419468 | 4 | 0.751625 | 0 | 0.490235 | 0 |
baseline_24_hour_0002 | 0.656220 | 1 | 0.367180 | 1 | 0.766945 | 0 | 0.483677 | 1 |
baseline_24_hour_0012 | 0.739897 | 1 | 0.435823 | 1 | 0.592386 | 0 | 0.481419 | 0 |
baseline_24_hour_0013 | 0.620977 | 1 | 0.349266 | 1 | 0.888115 | 0 | 0.481225 | 1 |
baseline_24_hour_0017 | 0.634695 | 5 | 0.354291 | 6 | 0.832869 | 0 | 0.480335 | 6 |
baseline_24_hour_0027 | 0.750448 | 1 | 0.449738 | 1 | 0.805014 | 10 | 0.476830 | 2 |
baseline_24_hour_0016 | 0.726918 | 7 | 0.419985 | 7 | 0.708449 | 4 | 0.476242 | 8 |
baseline_24_hour_0011 | 0.601456 | 0 | 0.339275 | 0 | 0.888115 | 1 | 0.475635 | 0 |
baseline_24_hour_0025 | 0.593753 | 5 | 0.336167 | 5 | 0.887187 | 1 | 0.474761 | 5 |
baseline_24_hour_0024 | 0.727551 | 3 | 0.420328 | 3 | 0.608171 | 2 | 0.466529 | 3 |
baseline_24_hour_0003 | 0.578981 | 1 | 0.326399 | 1 | 0.880687 | 0 | 0.463854 | 1 |
baseline_24_hour_0019 | 0.541522 | 3 | 0.313341 | 3 | 0.887187 | 4 | 0.458432 | 3 |
baseline_24_hour_0029 | 0.727973 | 2 | 0.396787 | 2 | 0.735376 | 4 | 0.456958 | 1 |
baseline_24_hour_0007 | 0.655587 | 1 | 0.355091 | 1 | 0.631383 | 1 | 0.454545 | 1 |
baseline_24_hour_0014 | 0.576765 | 2 | 0.318535 | 0 | 0.837512 | 1 | 0.452256 | 0 |
baseline_24_hour_0006 | 0.514298 | 1 | 0.302468 | 1 | 0.882544 | 0 | 0.448941 | 1 |
baseline_24_hour_0026 | 0.505434 | 0 | 0.298745 | 0 | 0.908542 | 9 | 0.445128 | 0 |
baseline_24_hour_0018 | 0.546164 | 4 | 0.303816 | 0 | 0.779480 | 2 | 0.436116 | 0 |
baseline_24_hour_0020 | 0.591326 | 4 | 0.306811 | 4 | 0.894150 | 0 | 0.434435 | 3 |
baseline_24_hour_0015 | 0.523267 | 0 | 0.296768 | 0 | 0.801300 | 0 | 0.433124 | 0 |
baseline_24_hour_0021 | 0.479582 | 3 | 0.286308 | 3 | 0.897864 | 4 | 0.430090 | 3 |
baseline_24_hour_0023 | 0.671204 | 2 | 0.354595 | 2 | 0.891829 | 0 | 0.429513 | 2 |
baseline_24_hour_0004 | 0.523794 | 0 | 0.287439 | 0 | 0.740483 | 0 | 0.414124 | 0 |
baseline_24_hour_0001 | 0.561465 | 0 | 0.296793 | 0 | 0.954967 | 1 | 0.412994 | 0 |
baseline_24_hour_0005 | 0.728606 | 0 | 0.340214 | 0 | 0.591922 | 1 | 0.407739 | 1 |
baseline_24_hour_0000 | 0.532658 | 0 | 0.273812 | 0 | 0.808728 | 1 | 0.407390 | 1 |
baseline_24_hour_0010 | 0.700327 | 0 | 0.369482 | 0 | 0.450789 | 0 | 0.406106 | 0 |
baseline_24_hour_0009 | 0.457951 | 0 | 0.251768 | 1 | 0.776695 | 1 | 0.380271 | 1 |
baseline_24_hour_0008 | 0.408674 | 1 | 0.226010 | 0 | 0.849582 | 0 | 0.357038 | 0 |
Appendix 5 - Strategy Results¶
return | long_only_hold | predictions | holding_begin_period | holding_end_period | strategy_return | |
---|---|---|---|---|---|---|
unix | ||||||
2018-02-27 08:00:00+00:00 | 0.002905 | 0.002905 | 0 | 0.0 | 0 | 0.000000 |
2018-02-27 09:00:00+00:00 | -0.002501 | 0.000404 | 0 | 0.0 | 0 | 0.000000 |
2018-02-27 10:00:00+00:00 | -0.014487 | -0.014083 | 1 | 0.0 | 1 | 0.000000 |
2018-02-27 11:00:00+00:00 | 0.005351 | -0.008732 | 0 | 1.0 | 0 | 0.005351 |
2018-02-27 12:00:00+00:00 | -0.000102 | -0.008834 | 0 | 0.0 | 0 | 0.005351 |
2018-02-27 13:00:00+00:00 | 0.007395 | -0.001439 | 0 | 0.0 | 0 | 0.005351 |
2018-02-27 14:00:00+00:00 | -0.010175 | -0.011614 | 0 | 0.0 | 0 | 0.005351 |
2018-02-27 15:00:00+00:00 | -0.011429 | -0.023043 | 0 | 0.0 | 0 | 0.005351 |
2018-02-27 16:00:00+00:00 | 0.004129 | -0.018913 | 0 | 0.0 | 0 | 0.005351 |
2018-02-27 17:00:00+00:00 | -0.005636 | -0.024550 | 0 | 0.0 | 0 | 0.005351 |
2018-02-27 18:00:00+00:00 | 0.004113 | -0.020437 | 0 | 0.0 | 0 | 0.005351 |
2018-02-27 19:00:00+00:00 | 0.005157 | -0.015280 | 0 | 0.0 | 0 | 0.005351 |
2018-02-27 20:00:00+00:00 | 0.000057 | -0.015223 | 0 | 0.0 | 0 | 0.005351 |
2018-02-27 21:00:00+00:00 | 0.006820 | -0.008403 | 0 | 0.0 | 0 | 0.005351 |
2018-02-27 22:00:00+00:00 | -0.009023 | -0.017426 | 1 | 0.0 | 1 | 0.005351 |
2018-02-27 23:00:00+00:00 | -0.004112 | -0.021538 | 1 | 1.0 | 1 | 0.001239 |
2018-02-28 00:00:00+00:00 | 0.006292 | -0.015246 | 1 | 1.0 | 1 | 0.007532 |
2018-02-28 01:00:00+00:00 | 0.004858 | -0.010388 | 1 | 1.0 | 1 | 0.012390 |
2018-02-28 02:00:00+00:00 | -0.000897 | -0.011285 | 1 | 1.0 | 1 | 0.011493 |
2018-02-28 03:00:00+00:00 | 0.007582 | -0.003702 | 0 | 1.0 | 0 | 0.019075 |
2018-02-28 04:00:00+00:00 | -0.004769 | -0.008471 | 0 | 0.0 | 0 | 0.019075 |
2018-02-28 05:00:00+00:00 | 0.000668 | -0.007803 | 0 | 0.0 | 0 | 0.019075 |
2018-02-28 06:00:00+00:00 | -0.008709 | -0.016512 | 0 | 0.0 | 0 | 0.019075 |
2018-02-28 07:00:00+00:00 | -0.000845 | -0.017358 | 0 | 0.0 | 0 | 0.019075 |
2018-02-28 08:00:00+00:00 | -0.013752 | -0.031110 | 0 | 0.0 | 0 | 0.019075 |
2018-02-28 09:00:00+00:00 | 0.003586 | -0.027524 | 0 | 0.0 | 0 | 0.019075 |
2018-02-28 10:00:00+00:00 | -0.002150 | -0.029674 | 0 | 0.0 | 0 | 0.019075 |
2018-02-28 11:00:00+00:00 | 0.001052 | -0.028622 | 0 | 0.0 | 0 | 0.019075 |
2018-02-28 12:00:00+00:00 | -0.007367 | -0.035988 | 1 | 0.0 | 1 | 0.019075 |
2018-02-28 13:00:00+00:00 | 0.007182 | -0.028807 | 0 | 1.0 | 0 | 0.026257 |
2018-02-28 14:00:00+00:00 | 0.005074 | -0.023733 | 0 | 0.0 | 0 | 0.026257 |
2018-02-28 15:00:00+00:00 | -0.001877 | -0.025609 | 0 | 0.0 | 0 | 0.026257 |
2018-02-28 16:00:00+00:00 | -0.004724 | -0.030334 | 1 | 0.0 | 1 | 0.026257 |
2018-02-28 17:00:00+00:00 | 0.002844 | -0.027489 | 1 | 1.0 | 1 | 0.029101 |
2018-02-28 18:00:00+00:00 | -0.000323 | -0.027813 | 1 | 1.0 | 1 | 0.028778 |
2018-02-28 19:00:00+00:00 | 0.002457 | -0.025356 | 0 | 1.0 | 0 | 0.031235 |
2018-02-28 20:00:00+00:00 | 0.001669 | -0.023687 | 0 | 0.0 | 0 | 0.031235 |
2018-02-28 21:00:00+00:00 | -0.003191 | -0.026878 | 0 | 0.0 | 0 | 0.031235 |
2018-02-28 22:00:00+00:00 | -0.008936 | -0.035814 | 0 | 0.0 | 0 | 0.031235 |
2018-02-28 23:00:00+00:00 | -0.010710 | -0.046523 | 0 | 0.0 | 0 | 0.031235 |
2018-03-01 00:00:00+00:00 | 0.005422 | -0.041101 | 0 | 0.0 | 0 | 0.031235 |
2018-03-01 01:00:00+00:00 | 0.001883 | -0.039219 | 0 | 0.0 | 0 | 0.031235 |
2018-03-01 02:00:00+00:00 | -0.003429 | -0.042648 | 0 | 0.0 | 0 | 0.031235 |
2018-03-01 03:00:00+00:00 | 0.002891 | -0.039756 | 0 | 0.0 | 0 | 0.031235 |
2018-03-01 04:00:00+00:00 | 0.002894 | -0.036862 | 0 | 0.0 | 0 | 0.031235 |
2018-03-01 05:00:00+00:00 | -0.000245 | -0.037107 | 1 | 0.0 | 1 | 0.031235 |
2018-03-01 06:00:00+00:00 | 0.000909 | -0.036198 | 1 | 1.0 | 1 | 0.032144 |
2018-03-01 07:00:00+00:00 | 0.005077 | -0.031121 | 1 | 1.0 | 1 | 0.037220 |
2018-03-01 08:00:00+00:00 | 0.005512 | -0.025609 | 1 | 1.0 | 1 | 0.042732 |
2018-03-01 09:00:00+00:00 | -0.003590 | -0.029200 | 1 | 1.0 | 1 | 0.039142 |
2018-03-01 10:00:00+00:00 | 0.006731 | -0.022468 | 0 | 1.0 | 0 | 0.045873 |
2018-03-01 11:00:00+00:00 | -0.001932 | -0.024400 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 12:00:00+00:00 | -0.006420 | -0.030820 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 13:00:00+00:00 | 0.003965 | -0.026855 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 14:00:00+00:00 | -0.001397 | -0.028252 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 15:00:00+00:00 | -0.004006 | -0.032258 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 16:00:00+00:00 | 0.004606 | -0.027651 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 17:00:00+00:00 | 0.008039 | -0.019612 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 18:00:00+00:00 | 0.004560 | -0.015052 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 19:00:00+00:00 | -0.004961 | -0.020013 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 20:00:00+00:00 | 0.004642 | -0.015371 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 21:00:00+00:00 | -0.006523 | -0.021894 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 22:00:00+00:00 | -0.001298 | -0.023192 | 0 | 0.0 | 0 | 0.045873 |
2018-03-01 23:00:00+00:00 | -0.001288 | -0.024481 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 00:00:00+00:00 | 0.000288 | -0.024193 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 01:00:00+00:00 | 0.004054 | -0.020139 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 02:00:00+00:00 | 0.000023 | -0.020116 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 03:00:00+00:00 | -0.002008 | -0.022124 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 04:00:00+00:00 | -0.002069 | -0.024193 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 05:00:00+00:00 | 0.001104 | -0.023089 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 06:00:00+00:00 | 0.002124 | -0.020965 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 07:00:00+00:00 | -0.003240 | -0.024204 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 08:00:00+00:00 | 0.004180 | -0.020024 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 09:00:00+00:00 | -0.011595 | -0.031620 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 10:00:00+00:00 | -0.002612 | -0.034232 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 11:00:00+00:00 | -0.000581 | -0.034813 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 12:00:00+00:00 | 0.003622 | -0.031191 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 13:00:00+00:00 | -0.000429 | -0.031620 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 14:00:00+00:00 | -0.011532 | -0.043152 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 15:00:00+00:00 | 0.000867 | -0.042284 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 16:00:00+00:00 | 0.000679 | -0.041605 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 17:00:00+00:00 | -0.000375 | -0.041980 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 18:00:00+00:00 | -0.001313 | -0.043292 | 0 | 0.0 | 0 | 0.045873 |
2018-03-02 19:00:00+00:00 | -0.004679 | -0.047972 | 1 | 0.0 | 1 | 0.045873 |
2018-03-02 20:00:00+00:00 | 0.007713 | -0.040259 | 0 | 1.0 | 0 | 0.053586 |
2018-03-02 21:00:00+00:00 | 0.004620 | -0.035639 | 0 | 0.0 | 0 | 0.053586 |
2018-03-02 22:00:00+00:00 | -0.000699 | -0.036338 | 0 | 0.0 | 0 | 0.053586 |
2018-03-02 23:00:00+00:00 | -0.003746 | -0.040084 | 1 | 0.0 | 1 | 0.053586 |
2018-03-03 00:00:00+00:00 | 0.003105 | -0.036979 | 1 | 1.0 | 1 | 0.056691 |
2018-03-03 01:00:00+00:00 | 0.007524 | -0.029454 | 0 | 1.0 | 0 | 0.064216 |
2018-03-03 02:00:00+00:00 | -0.001076 | -0.030531 | 0 | 0.0 | 0 | 0.064216 |
2018-03-03 03:00:00+00:00 | -0.004085 | -0.034615 | 0 | 0.0 | 0 | 0.064216 |
2018-03-03 04:00:00+00:00 | 0.003135 | -0.031481 | 0 | 0.0 | 0 | 0.064216 |
2018-03-03 05:00:00+00:00 | -0.002065 | -0.033546 | 0 | 0.0 | 0 | 0.064216 |
2018-03-03 06:00:00+00:00 | 0.001010 | -0.032536 | 0 | 0.0 | 0 | 0.064216 |
2018-03-03 07:00:00+00:00 | -0.004349 | -0.036885 | 0 | 0.0 | 0 | 0.064216 |
2018-03-03 08:00:00+00:00 | 0.003792 | -0.033093 | 0 | 0.0 | 0 | 0.064216 |
2018-03-03 09:00:00+00:00 | -0.001580 | -0.034673 | 0 | 0.0 | 0 | 0.064216 |
2018-03-03 10:00:00+00:00 | 0.000953 | -0.033720 | 0 | 0.0 | 0 | 0.064216 |
2018-03-03 11:00:00+00:00 | 0.001045 | -0.032675 | 0 | 0.0 | 0 | 0.064216 |