CQF June 2023 Intake: Final Project

Deep Learning for Asset Prediction

Will Colgate, Singapore, January 2024

Introduction¶

Using machine and deep learning techniques to predict price movements is considered the holy grail of modern finance, with great focus from both individual professionals & hobbyists, to large multinational businesses, with mixed results. Given the stochastic nature of returns, it is arguable if a reliable algorithm can be found that is actually effective in the "wild".

Operational complexities such as timely access to information, the brokerage spread and transaction costs make the task difficult before even considering the more philosophical question on how perfect the markets are. This is ratcheted up a notch with the irrational movements of crypto markets, which arguably have no underlying inherent value. Of course, this depends on who you ask. What is true is that crypto markets are very volatile, offering opportunities for large gains for those brave enough to take the risk.

Applying deep learning to these markets, whilst offering an interesting academic problem to explore, are unlikely to offer additional insight and the below should not be used as a basis for any investment decisions.

A Note on the Project Workflow¶

The approach to this problem will follow the well trodden machine learning workflow as follows:

  • Problem statements
  • Data collection
  • Exploratory data analysis
  • Data cleaning
  • Feature scaling and selection
  • Model design and hypertuning
  • Model evaluation

Given the subject matter is financial timeseries forecasting, the report will also include backtesting of the predictions versus a long only hold strategy, to understand if it achieves its objective of outperforming the market.

A full process diagram of how the problem was approached and the mdoel built is included in Appendix 1.

Problem Statement¶

The objective is to produce a model that can predict positive moves using Long Short-Term Memory (LSTM) networks in short term financial time series.

I have chosen Ethereum (ETH) as the ticker to analyse (technically a pair with USD). Crypto markets are notoriously volatile and it seems like a decent challenge to try and tease some insight out of the mess.

For this purpose, I will aim to predict an hourly positive return. Defining a positive return is discussed in more detail as part of the labels section. This will be a binary classification problem with 1 being the label for a positive move and 0 otherwise.

Whilst accuracy of the predictions will be an important metric, precision, recall and F1 will arguably be more important as measures of success. The precision of calculating the upward moves in particular would appear to be important as there is a realised financial loss from buying and selling at a loss in a high frequency setting. A thorough discussion of metrics is considered in the following pages.

Data Collection¶

Raw Data¶

Access to data is one of the biggest problems for deep learning problems. The amount of data required to train a (good) deep learning neural network is usually much more than is available outside of a professional setting. High frequency intraday data especially is difficult to come by, presumably due to the differences in exchanges, the cost of storage and how valuable the data can be.

After exploring Yahoo Finance (via the yfinance python package) and the Alpha Vantage API, it became apparent that these sources did not have the amount of data required or reliable data.

In the end, the data was sourced from https://www.cryptodatadownload.com/data/ from the data available from the Gemini exchange.

No description has been provided for this image

The data appears to be relatively complete on an arbitrary inspection and shows the meteoric rise of the crypto markets generally in 2021 and 2022 followed by the collapse in price due to the FTX and LUNA scandals. The raw data goes back to 2016, but truncated in the above chart for the reasons explained in the next few paragraphs.

The crypto market is notoriously emotion driven. Even glancing at social media or news outlets allows a person to gain a sense of how this is true. It follows then that some kind of sentiment regarding this emotional investing would potentially give some interesting insight into the problem statement. There is an interesting resource updated daily on alternative.me called the fear and greed index.

The index takes a weighted approach to a number of factors across 5 (formally, 6) data sources. A numerical value is assigned which falls into categories of:

  • Extreme Fear
  • Fear
  • Neutral
  • Greed
  • Extreme Greed

The index is updated daily at 00:00 UTC.

fg_value fg_value_classification
timestamp
2023-12-23 00:00:00+00:00 70 greed
2023-12-22 00:00:00+00:00 74 greed
2023-12-21 00:00:00+00:00 70 greed
2023-12-20 00:00:00+00:00 74 greed
2023-12-19 00:00:00+00:00 73 greed

The index could be a good indicator of sentiment in the crypto market as a whole. Crypto tokens do not have fundamental data so a traditional fundamental analysis cannot be undertaken. However, there are metrics associated with blockchains that can be accessed (such as transactions per second, blocks mined etc) that could be worth exploring in a further analysis but are outside the scope of this paper.

The index only began on 1 February 2018 so all price data before this date has been dropped and the the daily metric forward filled to the hourly data. This would reflect that the index would apply to the price movements throughout the day.

Feature Engineering¶

Feature engineering is the catch all term for using domain knowledge to generate insights from the raw dataset. Common data transformations for financial times series are known as technical analysis with associated literature that spans many volumes.

Using pandas-ta, I have generated standard technical indicators for the data based on high, low, open, close and volume. I have also generated temporal data to investigate if there is any kind of seasonality to returns. I have used the scikit-learn OneHotEncoder to encode these values (and the FG classification) into binary variables.

Interestingly, the pandas-ta module implements TA-Lib and candlestick patterns. Reading candlesticks is a classical version of technical analysis, originating from the rice markets of Japan. Recognising what patterns lead to an upward tick will be interesting to see.

Labelling the Data¶

Given the problem statement is to predict an hourly positive return. The 1 period return is calculated as follows, where $p$ is the closing price:

$$ r_t = ln(\frac{p_t}{p_{t-1}}) $$

A practical approach to predicting a positive return for these purposes would be any net return (i.e. after transaction costs).

Here are the fees from the Gemini exchange for reference. The taker fee at the lowest volume per month is 0.4%. To account for interest on margin, I will round this up to 0.5% as an estimate.

Therefore, a label of 1 will mean that that the upward return in the next hour will be greater than 0.05% and 0 otherwise. Mathematically:

$$ y_t = \begin{cases} 1 & \quad \text{if } r_{t+1} > 0.005\\ 0 & \quad \text{otherwise} \end{cases} $$

Exploratory Data Analysis¶

EDA is an important step in any machine learning workflow. Inititial analysis of the data after engineering the technical indicators indicated that some needed to be removed. This is discussed more in the cleaning section.

No description has been provided for this image

Some observations of the above:

  • A simple histogram of the the return and the fg_value indicates, as epected, that the daily returns are clustered around 0 but there are some significant outliers and high peaks.

  • The fear and greed value is distributed towards the lower end, possibly indicating that overall, fear dominated the crypto market in the time period in question.

No description has been provided for this image

Some observations from the above data:

  • Return does not seem to be correlated with fg_value. However, higher fg_values (i.e. more greed in the index), seems to see the variance decrease (heteroscedasticity).
  • There appears to be a slight negative correlation with fg_value and volume, indicating that less volume is traded during times of greed in the index.
  • Return does not have significant outliers but volume does look like it would be a candidate for robust scaler.
label
0    40858
1     9809
Name: count, dtype: int64

The above does indicate that there is quite a severe class imbalance in the data that will need to be addressed at the model building stage otherwise the model will likely underperform due to bias.

Next, I have examined the correlation between features. Collinearity between features is present and there are a number of features that will need cleaning before they can be used in the model.

No description has been provided for this image

The heatmap of cleaned features shows clear collinearity between a number of features. Collinearity in ML problems affects performance and interpretability and so is generally best removed. There are multiple methods of doing this but in this paper, I have focused on the following two methods:

  • Only retaining the first variable in a highly correlated pair
  • Discarding variables with a variance inflation factor of greater than 5

The first method is self explanatory, the second is defined as:

$$ VIF_i = \frac{1}{1-R^{2}_{i}} $$

Where $R^{2}_{i}$ is the unadjusted coefficient of determination for regressing variable $i$ on all the remaining independent variables. A VIF equal to 1 indicates that the variables are not correlated, between 1 and 5, that there is moderate correlation between that variable and others and greater than 5 indicating high correlation with other variables.

Data Cleaning¶

As already alluded to in the proceeding sectons, the joining of the two raw data sources and computation of the technical indicators requires some cleaning. A number of steps have been taken to deal with this in the clean_scale.py script. These are:

  • Drop all columns where pandas.ta has calculated NaN.
  • Drop columns where 20,000 datapoints (out of 65k+) are missing.
  • Remove the leading NaN rows of data due to calculation of rolling amounts (simple moving averages etc.).
  • Removal of all columns with no variance, indicating a single value. This is because no variance in the data indicates no informative information for the algorithm.

After this process, there are 324 features remaining in the dataset with hourly data from 26/02/2018 until 11/12/2023.

Feature Scaling¶

In order to get the best results from deep learning models, data generally needs to be scaled to aid in faster calculation of cost functions during gradient descent. There are various scaling techniques commonly used but this paper concentrates on two.

Min Max Scaler¶

The min max scaler rescales all features to within a range based on the following calculation:

$$ x_{scaled} = \frac{x_i - min(x)}{max(x) - min(x)} $$

This scaler is relatively sensitive to outliers but is generally good on many financial time series problems.

Robust Scaler¶

The Robust Scaler scales variables with significant outliers by using the quartiles of $x$ to scale the variables.

$$ x_{scaled} = \frac{x_i - Q_{1}(x)}{Q_{3}(x) - Q_{1}(x)} = \frac{x_i - Q_{1}(x)}{IQR(x)} $$

A significant outlier is calculated for the purposes of this paper as one that is 10 times the IQR. Appendix 2 lists each feature remianing after cleaning the data and the scaler applied to each.

Split Data into Train and Test¶

Before applying the chosen sclaing methods to each column, the data needs to be split into train and test data. This is because the scaling algorithm will be fit to the training data only and the test data scaled using metrics calculated on the training data only. This technique helps to avoid data leakage of the test data into the training dataset and also helps with regularization. Note that as this is time series data, the data should not be shuffled.

Once the train and test data are split, the train data is split again into a train and validation set test the mdoel during the training process.

All sets are then scaled in clean_scale.py using $x$ values calculated from the training set only to avoid any data leakage.

Feature Selection¶

There are many (and varied) techniques to feature selection and feature engineering as a whole, necessitating experimentation to try and meet the main objective of an efficient model with good predictive powers.

To attempt to achieve this, I have split the problem into four stages:

  1. Removal of collinearity using one of the two techniques described above.
  2. Use of boruta as a feature selection algorithm.
  3. Dimension reduction using Uniform Manifold Approximation & Projection (UMAP), a relatively new and novel unsupervised learning algorithm.
  4. Input the results of the above pipeline (or part there of) into a baseline one layer LSTM model using keras and anlyse the results.

The best performing pipeline above will be chosen to test other model architectures.

Boruta¶

The Boruta algorithm is designed around a random forest classifier. It seeks to establish what features contribute to the overall model. It does this by duplicating and shuffling the dataset into "shadow features". The classifier (in this case, a random forest) is then trained on both sets of data. Feature importance is compared to the shadow features. If the feature has a greater importance than its shadow equivalent, then the feature is retained.

The algorithm is implemented in Python through the boruta_py package. in order to capture as much data as possible, the parameter "perc" was set at 90 in line with the documentation so as to avoid too "strict" of an interpretaion of importance.

UMAP¶

Per the documentation, UMAP is a dimension reduction technique that can be used for visualisation similarly to t-distributed Stochastic Neighbor Embedding (t-SNE), but also for general non-linear dimension reduction. The mathamatics can be found in McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. According to the literature, UMAP offers significant performance improvements over other dimensionality techniques such as t-SNE and SOM. PCA was discounted in this analysis as it does not tend to work well on non-linear data.

The python implementation of UMAP takes a number of parameters but there are two main parameters that impact the clustering of the algoithm on the 2D plane.

Hyperparameter Description Value
n_neighbors This parameter controls how UMAP balances local versus global structure in the data. It does this by constraining the size of the local neighborhood UMAP will look at when attempting to learn the manifold structure of the data. Lower values of n_neighbours force the algorithm to focus more on the local structure, potentailly losing some of the global structure and vice versa. 10
min_dist This parameter controls how tightly UMAP is allowed to pack points together is the 2D representation. A lower min_dist generally means that points will clump together more. The choice of min_dist will depend on the use case of the algorithm, with a lower value generally being more useful for clustering problems. 0.1 (default)

Appendix 3 explores the output of the algorithm on changing the above parameters and demonstrates why the above were chosen as a middle ground on local and global structure to use for dimension reduction of the dataset.

Baseline Model¶

In order to evaluate the various pipelines to understand the best set of features for this particular problem statement, a baseline deep learning model was used. This consisted on a single layer with 36 units and 'relu' activation, being:

$$ x^+ = \begin{cases} x & \quad \text{if } x > 0\\ 0 & \quad \text{otherwise} \end{cases} $$

The past 6 hours (i.e. current and preceeding 5) of data was used a sequence length on the basis that this seemed like a reasonable amount of time to use in predicting an up movement in 1 hour's time but also, it is a good middle point in terms of performance. The model structure is detailed in the below diargram.

No description has been provided for this image

The model input is a 3 dimentsional tensor in the form (batch, sequence length, features). The output of the dense layer is a $p$ prediction that the output will be 1 such that:

$$ \text{prediction} = \begin{cases} 1 & \quad \text{if } p > 0.5\\ 0 & \quad \text{otherwise} \end{cases} $$

The optimizer used is Adam (see Kingma et al. 2014) and the loss function is binary cross entropy, being the most appropriate loss function for evaluating binary classification problems.

Before training, the class imbalance was dealt with by assigning a weight to each class. The weights are calculated by taking 1 divided by the count of the class multiplied by the total length of the array divided by 2. The resultant python dictionary can be used in the keras Model class as the "class_weights" parameter.

An early stopping callback was used in the model monitoring the validation loss (i.e. the model will cease running 10 epochs after the validation loss has ceased decreasing).

Experiment Results¶
max_binary_accuracy max_binary_accuracy_epoch max_precision max_precision_epoch max_recall max_recall_epoch max_f1 max_f1_epoch
run_name
run_pairwisecorr__boruta_01-14-2024-14:16:04 0.698578 13 0.392012 13 0.885132 3 0.484584 17
run_boruta_01-14-2024-14:40:32 0.581991 4 0.329438 4 0.906438 1 0.468319 4
run_vif_01-14-2024-14:23:39 0.641601 28 0.342770 28 0.928208 0 0.466315 16
run_vif__boruta_01-14-2024-14:31:03 0.583992 34 0.324927 34 0.858268 4 0.457744 2
run_vif__umap_01-14-2024-14:50:45 0.656767 11 0.345592 11 0.781843 0 0.435312 9
run_pairwisecorr_01-14-2024-14:15:30 0.562085 0 0.274124 0 0.930523 9 0.403786 16
run_all_01-14-2024-14:14:58 0.706477 2 0.353408 2 0.963872 5 0.400226 18
run_boruta__umap_01-14-2024-14:45:33 0.674882 7 0.332054 9 0.907365 0 0.391881 1
run_pairwisecorr__boruta__umap_01-14-2024-14:19:45 0.718483 10 0.374512 10 0.358962 2 0.365049 2
run_umap_01-14-2024-14:50:18 0.688046 10 0.260012 10 0.554887 1 0.331351 1

An F1 score has been calculated at each epoch using the precision and recall. The F1 metric attemps to evaluate the model on its class-wise performance and is the harmonic mean of the precision and recall scores. Mathematically:

$Precision = \frac{TP}{TP + FP}$

$Recall = \frac{TP}{TP + FN}$

$Precision = 2 \times \frac{Precision \times Recall}{Precision + Recall}$

where

$TP = \text{True Positive}$

$FP = \text{False Positive}$

$FN = \text{False Negative}$

For a given class:

  • A high precision and a high recall - The class has been well managed by the model
  • High precision and low recall - The class is not well detected but when it is, the model is very reliable.
  • Low precision and high recall - The class is well detected, but also includes observations of other classes.
  • Low precision and low recall - the class has not been handled well at all

The use of F1 is due to the fact that precision and recall are often antagonistic. The F1 score measures both precision and recall in one measure. The higher the F1 score, the better the overall model in classifying both classes.

The F1 score has been used as the principal metric on which to evaluate the model as it offers a good "all encompassing" metric where neither precision nor recall are obviously more important.

Pipeline Selection¶

From the above results, the preprocessing that gives the highest F1 is pairwise reduction in correlation followed by using Boruta to assess the remaining features. This is the preprocessing pipeline I have adopted for the remainder of this report.

Deep Learning Model¶

A Sensible Baseline to Beat¶

In order to judge if the model is achieving its stated objective, a sensible baseline to beat should be established. Given the class counts above, predicting a 0 each time would result in an accuracy for 0.8064 (if not any chance of a profit as an investor would never take any risk). This would result in a precision and recall on class 1 of 0, which should be beatable....

If we can approach this sensible baseline, then a backtest would determine if a strategy based on these signals would result in any profit, over and above a long hold.

Tested Models¶

In line with the requirments of the project, LSTM models have been used to attempt the classification problem. An LSTM layer is a type of recurrant neaural network that enable the learning of long term dependecneis. They were first proposed in Hochreiter & Schmidhuber (1997) and refined since then. The model structure is included below.

No description has been provided for this image

Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

The key to the LSTM's is the cell state which is represented by the horizontel line in the above diagram. This line containes the prior data and passes a number of gates that have the ability to add or remove information. The models are widely used in a number of machines learning problems.

For the basis of this paper, 1, 2 and 3 layer LSTM's were considered when evaluating the predictive power of the model. The additional layers add complexity and computing time to the model and this needs to be weighed with the predictive power.

Dropout layers, whilst useful for regualrization of other ML problems, are known to hinder learning in RNNs. In line with Gal & Ghahramani (2016), the recurrent dropout parameter was used to introduce dropout into the models. This parameter uses the same dropout mask on each unit. Using the same dropout msak at every timestep allows the network to properly propagate its learning error throughout time whereas a temporally random dropout would disrupt this error (Deep Learning with Python, Chollet).

Each of these models was tested using 4 lookback periods, being 1, 6, 12 and 24 hours. The results of the experiments are presented below.

Experiment Results¶
max_binary_accuracy max_binary_accuracy_epoch max_precision max_precision_epoch max_recall max_recall_epoch max_f1 max_f1_epoch
run_name
baseline_24_hour 0.712251 7 0.405663 7 0.862117 5 0.493919 11
two_layer_dropout_24_hour 0.658436 18 0.372197 18 0.912256 1 0.493506 18
two_layer_12_hour 0.686268 22 0.388934 22 0.906858 1 0.490676 22
three_layer_dropout_12_hour 0.623353 11 0.353416 11 0.910565 9 0.488552 11
baseline_dropout_24_hour 0.688720 17 0.389321 17 0.896472 24 0.486957 17
two_layer_dropout_6_hour 0.642654 30 0.359836 30 0.935618 0 0.482853 30
three_layer_12_hour 0.641058 25 0.358053 25 0.933735 1 0.480317 25
baseline_12_hour 0.616503 16 0.345375 16 0.886932 7 0.476177 16
baseline_dropout_6_hour 0.612428 28 0.342449 28 0.900417 0 0.475862 20
three_layer_dropout_6_hour 0.609689 38 0.341398 38 0.942103 1 0.473280 38
two_layer_6_hour 0.638968 6 0.353903 6 0.908754 12 0.472778 6
two_layer_24_hour 0.645141 28 0.354513 28 0.902971 0 0.470426 29
two_layer_dropout_12_hour 0.580883 23 0.327153 31 0.900371 1 0.468639 31
three_layer_6_hour 0.583886 10 0.329333 10 0.911070 0 0.466730 10
baseline_dropout_12_hour 0.579408 25 0.327369 25 0.926784 6 0.465515 25
baseline_6_hour 0.600527 21 0.332238 21 0.902733 0 0.463391 17
baseline_1_hour 0.626526 23 0.340106 23 0.874711 3 0.461673 25
baseline_dropout_1_hour 0.629579 24 0.342326 24 0.874711 3 0.461654 18
three_layer_dropout_1_hour 0.565895 5 0.321109 5 0.926491 0 0.460492 5
three_layer_1_hour 0.565579 5 0.320999 5 0.926491 0 0.460452 5
two_layer_dropout_1_hour 0.515579 0 0.299655 0 0.917245 3 0.443265 8
two_layer_1_hour 0.515579 0 0.299655 0 0.917245 3 0.442980 8
three_layer_24_hour 0.564419 2 0.281820 4 0.972609 9 0.410888 14
three_layer_dropout_24_hour 0.475150 2 0.250718 6 0.991643 19 0.387369 6

The above indicates that the baseline model with a look back period of 24 hours performed the best in terms of accuracy and the F1 statistic (on the validation data). This model structure will be selected as the final model for hyperparameter tuning.

Hypertuning Strategy¶

Having determined the model structure that results in the highest F1 score, the hyperparameters of the model are tuned in order to try and improve performance. Using Keras Tuner for this task, the first decision is the choice in tuning algorithm.

Tuner Description
RandomSearch Chooses hyperparameters at random. Computationally expensive and completely random (as the name suggests) in finding a best set of hyperparameters.
GridSearch Similar to the grid search in Scikit Learn, this tuner attempts all possible combination of hyperparameters to find the best. Again, computationally (very) expensive.
BaysianOptimization Using a Baysian approach to optimization with a Gaussian distribution.
Hyperband The Keras Tuner implementation of Hyberband. This is a novel bandit based approach to optimization based on the paper by Li, Lisha, and Kevin Jamieson, "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.

According to the algorithm's authors, Hyberband offers performance advantages over all the other available tuning algorithms. This is the tuner I have used to tune the model.

The parameters to be tuned are:

| Parameter | Description | |---------------------|--------------------------------------------------------------------------------------------------------------------------------| | Units is LSTM layer | The number of units for the LSTM layer | | Activation function | This is a choice of either relu, elu, tanh, sigmoid, selu | | Learning rate | The learning rate of the gradient descent. A float between 0.0005 and 0.01 | | Beta 1 | The exponential decay rate for the first moment estimates. A float between 0.5 and 0.99 | | Beta 2 | The exponential decay rate for the second-moment estimates. Generally should be set close to 1. A float between 0.5 and 0.9999 |9999 |

Given the validation loss of the model seems to minimise at around epoch 7, the max epochs parameter for the tuner was set at 15. This would allow enough expansion for testing hyperparameters without being overly wastful on resources.

After the tuning process, trial 28 was chosen and the final parameters were:

Parameter Value
Units 400
Activation tanh
Learning rate 0.00058215

Both betas were tuned but lost in the output when generating the model. All the results of the hypertuning can be found in Appendix 4.

Final Model¶
No description has been provided for this image

Model Evaluation¶

To evaluate the model, it is trained on the test data which has been held back. The scaled data is reimported and put through the pairwise correlation and boruta pipeline. The pipeline is converted to a tensorflow Dataset using the timeseries utility function, giving the same dataset as used in the experimentations designed above.

retained_feature
0 open
1 high
2 ABER_XG_5_15
3 ACCBL_20
4 ACCBU_20
5 ADX_14
6 AGj_13_8_5
7 ALPHAT_14_1_50
8 OBV_min_2
9 OBV_max_2
10 AROONOSC_14
11 BBU_5_2.0
12 BIAS_SMA_26
13 BOP
14 PIVOTS_TRAD_D_R1
15 UO_7_14_28

It is interesting to note when examining the retained features that the fear and greed index data and seasonality data has been dropped. This indicates that these features may need some additional transformations in order to have a significant impact on the model or that they are not as useful in predicting short term movements as other features.

It is also interesting to note that the candlestick patterns are not included either. There goes hundreds of years of Japanese know how!

The remaining features seem to be a good mix of raw data, momentum indicators and trend. The function names can be found at https://ta-lib.org/functions/ and there is volumes of information on their definition and usage available online.

              precision    recall  f1-score   support

           0       0.94      0.81      0.87     11169
           1       0.31      0.64      0.42      1475

    accuracy                           0.79     12644
   macro avg       0.63      0.73      0.64     12644
weighted avg       0.87      0.79      0.82     12644

The above indicates quite encouraging results for the model when it is tested on test data. The model predicts the 0 class well but struggles more with class 1. For the use case for this model whereby a person would use these signals to enter or exit a trade, it could be useful as any incorrect position would quickly be exited with a correct prediciton of class 0.

The weighted avereage F1 score also indicates a model with decent predictive power. The results are quite encouraging!

No description has been provided for this image

In constructing the confusion matrix, the values have been normalised due to the class imbalance. This generally gives a better understanding of how the model is predicting each class in these cases.

The results are promising, indicating that the model has a relatively small proportion of false positives and false negatives. As false positives are more likely to result in actual loss of investment value, this is encouraging. The model appears to perform relatively well, all things considered.

No description has been provided for this image

$\text{True Positive Rate} = Recall$

$\text{False Positive Rate} = \frac{False Postive}{Trune Negative + False Positive}$

The red line on the above digram represents the situation where the true positive rate is equal to the false positve rate. Points above this line indicate where the proportion of of correctly classified points belonging to the positive class is greater than the proportion of incorrectly classified points belonging to the negative class.

A perfect model that correctly classified everything, would have the elbow point at the coordinates (0, 1). The area under the curve (AUC) provides an aggregate measure of performance across all possible classification thresholds. An AUC above 0.5 indicates that the model has some predictive power of a random choice. In short, the greater the gradient of the ROC curve, the better the predictive power of the model.

A 0.73 is not bad and indicates that the model has moderate predictive power which will be tested more in the backtesting section.

All in all, the model appears to be well regularized and performs well on the test data. Finally, a backtest needs to be conducted to simulate how the predictions perform as signals for trading decisions.

Backtesting¶

Approaches to backtesting can vary but I have implemented a long only strategy. This will be conpared to a long hold from the beginning of the period.

A few assumption have been made when considering the strategy:

  • That the minimum holding period will be for the duration of the timestep (i.e. 1 hour in this case)
  • That a long position can be taken in the security at the close price from the previous step, thus locking in the maximum return
  • That the investor operating the strategy can borrow for free or has access to cash to continue taking positions when all funds are lost

These assumptions are not appropriate is a real market situation and should be considered further should this strategy need to be used in production.

No description has been provided for this image

Wow! I'm going to be a billionaire!

But seriously, the backtesting does appear to indicate that the model is very good at generating a consistent return. However, significantly more testing would need to be undertaken before using the model 'in the wild' with real time market data, order books and real money. The above indicates that there may be some data leakage in the model that should be investigated further as part of any further analysis

Arguably, the reason for the good backtest is that downward movements are predicted very well (0.94 precision) meaning any mistakes are quickly rectified. Combined with how generally small hourly returns are, any incorrect position would quickly be closed out. Transaction costs would need to be build into any production model to better model if the strategy would work.

The first 100 results of the strategy can be found in Appendix 5.

Conclusions¶

The final model failed to beat the sensible baseline by 1.64%. However, the confusion matrix and AUC metrics indicate the model performs well in predicting true positive and true negative, indicating a goog level of performance. Also, it did do relatively well when real investment actions were simulated (subject to further testing) so the model could warant further analysis, incorporating more real world trading conditions.

It is interesting to note that the fear and greed index ultimately had no impact on the analysis, potentially indicating that there is value in ignoring the noise and sticking to a dispassionate strategy.

Thank you for reading.

Appendices¶

Appendix 1 - Process Diagram¶

Appendix 2 - Features and Scalers¶
feature
scaler
RobustScaler ABER_ATR_5_15
MinMaxScaler ABER_SG_5_15
MinMaxScaler ABER_XG_5_15
MinMaxScaler ABER_ZG_5_15
MinMaxScaler ACCBL_20
MinMaxScaler ACCBM_20
MinMaxScaler ACCBU_20
MinMaxScaler AD
RobustScaler ADOSC_3_10
MinMaxScaler ADXR_14_2
MinMaxScaler ADX_14
MinMaxScaler AGj_13_8_5
MinMaxScaler AGl_13_8_5
MinMaxScaler AGt_13_8_5
MinMaxScaler ALMA_9_6.0_0.85
MinMaxScaler ALPHAT_14_1_50
MinMaxScaler ALPHATl_14_1_50_2
MinMaxScaler AMATe_LR_8_21_2
MinMaxScaler AMATe_SR_8_21_2
MinMaxScaler AOBV_LR_2
MinMaxScaler AOBV_SR_2
RobustScaler AO_5_34
RobustScaler APO_12_26
MinMaxScaler AROOND_14
MinMaxScaler AROONOSC_14
MinMaxScaler AROONU_14
MinMaxScaler AR_26
MinMaxScaler ATRTSe_14_20_3.0
RobustScaler ATRr_14
RobustScaler BBB_5_2.0
MinMaxScaler BBL_5_2.0
MinMaxScaler BBM_5_2.0
RobustScaler BBP_5_2.0
MinMaxScaler BBU_5_2.0
RobustScaler BEARP_13
RobustScaler BIAS_SMA_26
MinMaxScaler BOP
MinMaxScaler BR_26
RobustScaler BULLP_13
MinMaxScaler CCI_14_0.015
MinMaxScaler CDL_3WHITESOLDIERS
MinMaxScaler CDL_ADVANCEBLOCK
MinMaxScaler CDL_BELTHOLD
MinMaxScaler CDL_CLOSINGMARUBOZU
MinMaxScaler CDL_DOJI_10_0.1
MinMaxScaler CDL_DRAGONFLYDOJI
MinMaxScaler CDL_GRAVESTONEDOJI
MinMaxScaler CDL_HAMMER
MinMaxScaler CDL_HANGINGMAN
MinMaxScaler CDL_HIGHWAVE
MinMaxScaler CDL_HIKKAKE
MinMaxScaler CDL_HIKKAKEMOD
MinMaxScaler CDL_IDENTICAL3CROWS
MinMaxScaler CDL_INSIDE
MinMaxScaler CDL_LONGLEGGEDDOJI
MinMaxScaler CDL_LONGLINE
MinMaxScaler CDL_MARUBOZU
MinMaxScaler CDL_MATCHINGLOW
MinMaxScaler CDL_RICKSHAWMAN
MinMaxScaler CDL_SEPARATINGLINES
MinMaxScaler CDL_SHORTLINE
MinMaxScaler CDL_SPINNINGTOP
MinMaxScaler CDL_STALLEDPATTERN
MinMaxScaler CDL_TAKURI
RobustScaler CFO_9
RobustScaler CG_10
MinMaxScaler CHDLREXTd_22_22_14_2.0
MinMaxScaler CHDLREXTl_22_22_14_2.0
MinMaxScaler CHDLREXTs_22_22_14_2.0
MinMaxScaler CHOP_14_1_100.0
MinMaxScaler CKSPl_10_3_20
MinMaxScaler CKSPs_10_3_20
MinMaxScaler CMF_20
MinMaxScaler CMO_14
MinMaxScaler COPC_11_14_10
MinMaxScaler CRSI_3_2_100
MinMaxScaler CTI_12
RobustScaler CUBE_3.0_-1
RobustScaler CUBEs_3.0_-1
MinMaxScaler DCL_20_20
MinMaxScaler DCM_20_20
MinMaxScaler DCU_20_20
MinMaxScaler DEC_1
MinMaxScaler DEMA_10
RobustScaler DMN_14
RobustScaler DMP_14
RobustScaler DPO_20
MinMaxScaler D_9_3
MinMaxScaler EBSW_40_10
RobustScaler EFI_13
MinMaxScaler EMA_10
RobustScaler ENTP_10
MinMaxScaler ER_10
MinMaxScaler FAMA_0.5_0.05
MinMaxScaler FISHERT_9_1
MinMaxScaler FISHERTs_9_1
MinMaxScaler FWMA_10
MinMaxScaler HA_close
MinMaxScaler HA_high
MinMaxScaler HA_low
MinMaxScaler HA_open
MinMaxScaler HILO_13_21
MinMaxScaler HL2
MinMaxScaler HLC3
MinMaxScaler HMA_10
MinMaxScaler HWL_1
MinMaxScaler HWMA_0.2_0.1_0.1
MinMaxScaler HWM_1
MinMaxScaler HWU_1
MinMaxScaler ICS_26
MinMaxScaler IKS_26
MinMaxScaler INC_1
MinMaxScaler INERTIA_20_14
MinMaxScaler INVFISHER_1.0
MinMaxScaler INVFISHERs_1.0
MinMaxScaler ISA_9
MinMaxScaler ISB_26
MinMaxScaler ITS_9
MinMaxScaler JMA_7_0.0
MinMaxScaler J_9_3
MinMaxScaler KAMA_10_2_30
MinMaxScaler KCBe_20_2
MinMaxScaler KCLe_20_2
MinMaxScaler KCUe_20_2
MinMaxScaler KST_10_15_20_30_10_10_10_15
MinMaxScaler KSTs_9
RobustScaler KURT_30
RobustScaler KVO_34_55_13
RobustScaler KVOs_34_55_13
MinMaxScaler K_9_3
MinMaxScaler LDECAY_1
MinMaxScaler LINREG_14
RobustScaler LOGRET_1
RobustScaler MACD_12_26_9
RobustScaler MACDh_12_26_9
RobustScaler MACDs_12_26_9
RobustScaler MAD_30
MinMaxScaler MAMA_0.5_0.05
MinMaxScaler MASSI_9_25
MinMaxScaler MCGD_10
MinMaxScaler MEDIAN_30
MinMaxScaler MFI_14
MinMaxScaler MIDPOINT_2
MinMaxScaler MIDPRICE_2
RobustScaler MOM_10
RobustScaler NATR_14
MinMaxScaler NVI_1
MinMaxScaler OBV
MinMaxScaler OBV_max_2
MinMaxScaler OBV_min_2
MinMaxScaler OBVe_12
MinMaxScaler OBVe_4
MinMaxScaler OHLC4
RobustScaler PCTRET_1
RobustScaler PDIST
MinMaxScaler PGO_14
MinMaxScaler PIVOTS_TRAD_D_P
MinMaxScaler PIVOTS_TRAD_D_R1
MinMaxScaler PIVOTS_TRAD_D_R2
MinMaxScaler PIVOTS_TRAD_D_R3
MinMaxScaler PIVOTS_TRAD_D_R4
MinMaxScaler PIVOTS_TRAD_D_S1
MinMaxScaler PIVOTS_TRAD_D_S2
MinMaxScaler PIVOTS_TRAD_D_S3
MinMaxScaler PIVOTS_TRAD_D_S4
RobustScaler PPO_12_26_9
RobustScaler PPOh_12_26_9
RobustScaler PPOs_12_26_9
MinMaxScaler PSARaf_0.02_0.2
MinMaxScaler PSARr_0.02_0.2
MinMaxScaler PSL_12
MinMaxScaler PVI_1
RobustScaler PVOL
MinMaxScaler PVO_12_26_9
MinMaxScaler PVOh_12_26_9
MinMaxScaler PVOs_12_26_9
MinMaxScaler PVR
MinMaxScaler PVT
MinMaxScaler PWMA_10
MinMaxScaler QQE_14_5_4.236
MinMaxScaler QQE_14_5_4.236_RSIMA
RobustScaler QS_10
MinMaxScaler QTL_30_0.5
MinMaxScaler REFLEX_20_20_0.04
MinMaxScaler REMAP_0.0_100.0_-1.0_1.0
MinMaxScaler RMA_10
RobustScaler ROC_10
MinMaxScaler RSI_14
MinMaxScaler RSX_14
MinMaxScaler RVGI_14_4
MinMaxScaler RVGIs_14_4
MinMaxScaler RVI_14
MinMaxScaler RWIh_14
MinMaxScaler RWIl_14
MinMaxScaler SINWMA_14
MinMaxScaler SKEW_30
RobustScaler SLOPE_1
MinMaxScaler SMA_10
MinMaxScaler SMI_5_20_5_1.0
MinMaxScaler SMIo_5_20_5_1.0
MinMaxScaler SMIs_5_20_5_1.0
MinMaxScaler SMMA_7
RobustScaler SQZPRO_20_2.0_20_2.0_1.5_1.0
MinMaxScaler SQZPRO_OFF
MinMaxScaler SQZPRO_ON_NARROW
MinMaxScaler SQZPRO_ON_NORMAL
MinMaxScaler SQZPRO_ON_WIDE
RobustScaler SQZ_20_2.0_20_1.5
MinMaxScaler SQZ_OFF
MinMaxScaler SQZ_ON
MinMaxScaler SSF3_20
MinMaxScaler SSF_20
MinMaxScaler STC_10_12_26_0.5
RobustScaler STCmacd_10_12_26_0.5
MinMaxScaler STCstoch_10_12_26_0.5
RobustScaler STDEV_30
MinMaxScaler STOCHFd_14_3
MinMaxScaler STOCHFk_14_3
MinMaxScaler STOCHRSId_14_14_3_3
MinMaxScaler STOCHRSIk_14_14_3_3
MinMaxScaler STOCHd_14_3_3
MinMaxScaler STOCHh_14_3_3
MinMaxScaler STOCHk_14_3_3
MinMaxScaler SUPERT_7_3.0
MinMaxScaler SUPERTd_7_3.0
MinMaxScaler SWMA_10
MinMaxScaler T3_10_0.7
MinMaxScaler TEMA_10
RobustScaler THERMO_20_2_0.5
MinMaxScaler THERMOl_20_2_0.5
RobustScaler THERMOma_20_2_0.5
MinMaxScaler THERMOs_20_2_0.5
MinMaxScaler TMO_14_5_3
MinMaxScaler TMOs_14_5_3
MinMaxScaler TOS_STDEVALL_LR
MinMaxScaler TOS_STDEVALL_L_1
MinMaxScaler TOS_STDEVALL_L_2
MinMaxScaler TOS_STDEVALL_L_3
MinMaxScaler TOS_STDEVALL_U_1
MinMaxScaler TOS_STDEVALL_U_2
MinMaxScaler TOS_STDEVALL_U_3
MinMaxScaler TRENDFLEX_20_20_0.04
MinMaxScaler TRIMA_10
MinMaxScaler TRIX_30_9
MinMaxScaler TRIXs_30_9
RobustScaler TRUERANGE_1
MinMaxScaler TSI_13_25_13
MinMaxScaler TSIs_13_25_13
RobustScaler TSV_18_10
RobustScaler TSVr_18_10
RobustScaler TSVs_18_10
MinMaxScaler TTM_TRND_6
RobustScaler UI_14
MinMaxScaler UO_7_14_28
RobustScaler VAR_30
MinMaxScaler VHF_28
RobustScaler VHM_610
MinMaxScaler VTXM_14
MinMaxScaler VTXP_14
MinMaxScaler VWAP_D
MinMaxScaler VWMA_10
MinMaxScaler WCP
MinMaxScaler WILLR_14
MinMaxScaler WMA_10
MinMaxScaler ZL_EMA_10
MinMaxScaler ZS_30
MinMaxScaler close
MinMaxScaler close_Z_30_1
MinMaxScaler day_of_week_0
MinMaxScaler day_of_week_1
MinMaxScaler day_of_week_2
MinMaxScaler day_of_week_3
MinMaxScaler day_of_week_4
MinMaxScaler day_of_week_5
MinMaxScaler day_of_week_6
MinMaxScaler fg_value
MinMaxScaler fg_value_classification_extreme fear
MinMaxScaler fg_value_classification_extreme greed
MinMaxScaler fg_value_classification_fear
MinMaxScaler fg_value_classification_greed
MinMaxScaler fg_value_classification_neutral
MinMaxScaler high
MinMaxScaler high_Z_30_1
MinMaxScaler hour_0
MinMaxScaler hour_1
MinMaxScaler hour_10
MinMaxScaler hour_11
MinMaxScaler hour_12
MinMaxScaler hour_13
MinMaxScaler hour_14
MinMaxScaler hour_15
MinMaxScaler hour_16
MinMaxScaler hour_17
MinMaxScaler hour_18
MinMaxScaler hour_19
MinMaxScaler hour_2
MinMaxScaler hour_20
MinMaxScaler hour_21
MinMaxScaler hour_22
MinMaxScaler hour_23
MinMaxScaler hour_3
MinMaxScaler hour_4
MinMaxScaler hour_5
MinMaxScaler hour_6
MinMaxScaler hour_7
MinMaxScaler hour_8
MinMaxScaler hour_9
MinMaxScaler low
MinMaxScaler low_Z_30_1
MinMaxScaler month_1
MinMaxScaler month_10
MinMaxScaler month_11
MinMaxScaler month_12
MinMaxScaler month_2
MinMaxScaler month_3
MinMaxScaler month_4
MinMaxScaler month_5
MinMaxScaler month_6
MinMaxScaler month_7
MinMaxScaler month_8
MinMaxScaler month_9
MinMaxScaler open
MinMaxScaler open_Z_30_1
RobustScaler volume
Appendix 3 - UMAP Parameters¶
No description has been provided for this image
Appendix 4 - Hypertuning Results¶
max_binary_accuracy max_binary_accuracy_epoch max_precision max_precision_epoch max_recall max_recall_epoch max_f1 max_f1_epoch
run_name
baseline_24_hour_0028 0.680384 12 0.387734 12 0.863510 4 0.499422 12
baseline_24_hour_0022 0.729239 4 0.419468 4 0.751625 0 0.490235 0
baseline_24_hour_0002 0.656220 1 0.367180 1 0.766945 0 0.483677 1
baseline_24_hour_0012 0.739897 1 0.435823 1 0.592386 0 0.481419 0
baseline_24_hour_0013 0.620977 1 0.349266 1 0.888115 0 0.481225 1
baseline_24_hour_0017 0.634695 5 0.354291 6 0.832869 0 0.480335 6
baseline_24_hour_0027 0.750448 1 0.449738 1 0.805014 10 0.476830 2
baseline_24_hour_0016 0.726918 7 0.419985 7 0.708449 4 0.476242 8
baseline_24_hour_0011 0.601456 0 0.339275 0 0.888115 1 0.475635 0
baseline_24_hour_0025 0.593753 5 0.336167 5 0.887187 1 0.474761 5
baseline_24_hour_0024 0.727551 3 0.420328 3 0.608171 2 0.466529 3
baseline_24_hour_0003 0.578981 1 0.326399 1 0.880687 0 0.463854 1
baseline_24_hour_0019 0.541522 3 0.313341 3 0.887187 4 0.458432 3
baseline_24_hour_0029 0.727973 2 0.396787 2 0.735376 4 0.456958 1
baseline_24_hour_0007 0.655587 1 0.355091 1 0.631383 1 0.454545 1
baseline_24_hour_0014 0.576765 2 0.318535 0 0.837512 1 0.452256 0
baseline_24_hour_0006 0.514298 1 0.302468 1 0.882544 0 0.448941 1
baseline_24_hour_0026 0.505434 0 0.298745 0 0.908542 9 0.445128 0
baseline_24_hour_0018 0.546164 4 0.303816 0 0.779480 2 0.436116 0
baseline_24_hour_0020 0.591326 4 0.306811 4 0.894150 0 0.434435 3
baseline_24_hour_0015 0.523267 0 0.296768 0 0.801300 0 0.433124 0
baseline_24_hour_0021 0.479582 3 0.286308 3 0.897864 4 0.430090 3
baseline_24_hour_0023 0.671204 2 0.354595 2 0.891829 0 0.429513 2
baseline_24_hour_0004 0.523794 0 0.287439 0 0.740483 0 0.414124 0
baseline_24_hour_0001 0.561465 0 0.296793 0 0.954967 1 0.412994 0
baseline_24_hour_0005 0.728606 0 0.340214 0 0.591922 1 0.407739 1
baseline_24_hour_0000 0.532658 0 0.273812 0 0.808728 1 0.407390 1
baseline_24_hour_0010 0.700327 0 0.369482 0 0.450789 0 0.406106 0
baseline_24_hour_0009 0.457951 0 0.251768 1 0.776695 1 0.380271 1
baseline_24_hour_0008 0.408674 1 0.226010 0 0.849582 0 0.357038 0
Appendix 5 - Strategy Results¶
return long_only_hold predictions holding_begin_period holding_end_period strategy_return
unix
2018-02-27 08:00:00+00:00 0.002905 0.002905 0 0.0 0 0.000000
2018-02-27 09:00:00+00:00 -0.002501 0.000404 0 0.0 0 0.000000
2018-02-27 10:00:00+00:00 -0.014487 -0.014083 1 0.0 1 0.000000
2018-02-27 11:00:00+00:00 0.005351 -0.008732 0 1.0 0 0.005351
2018-02-27 12:00:00+00:00 -0.000102 -0.008834 0 0.0 0 0.005351
2018-02-27 13:00:00+00:00 0.007395 -0.001439 0 0.0 0 0.005351
2018-02-27 14:00:00+00:00 -0.010175 -0.011614 0 0.0 0 0.005351
2018-02-27 15:00:00+00:00 -0.011429 -0.023043 0 0.0 0 0.005351
2018-02-27 16:00:00+00:00 0.004129 -0.018913 0 0.0 0 0.005351
2018-02-27 17:00:00+00:00 -0.005636 -0.024550 0 0.0 0 0.005351
2018-02-27 18:00:00+00:00 0.004113 -0.020437 0 0.0 0 0.005351
2018-02-27 19:00:00+00:00 0.005157 -0.015280 0 0.0 0 0.005351
2018-02-27 20:00:00+00:00 0.000057 -0.015223 0 0.0 0 0.005351
2018-02-27 21:00:00+00:00 0.006820 -0.008403 0 0.0 0 0.005351
2018-02-27 22:00:00+00:00 -0.009023 -0.017426 1 0.0 1 0.005351
2018-02-27 23:00:00+00:00 -0.004112 -0.021538 1 1.0 1 0.001239
2018-02-28 00:00:00+00:00 0.006292 -0.015246 1 1.0 1 0.007532
2018-02-28 01:00:00+00:00 0.004858 -0.010388 1 1.0 1 0.012390
2018-02-28 02:00:00+00:00 -0.000897 -0.011285 1 1.0 1 0.011493
2018-02-28 03:00:00+00:00 0.007582 -0.003702 0 1.0 0 0.019075
2018-02-28 04:00:00+00:00 -0.004769 -0.008471 0 0.0 0 0.019075
2018-02-28 05:00:00+00:00 0.000668 -0.007803 0 0.0 0 0.019075
2018-02-28 06:00:00+00:00 -0.008709 -0.016512 0 0.0 0 0.019075
2018-02-28 07:00:00+00:00 -0.000845 -0.017358 0 0.0 0 0.019075
2018-02-28 08:00:00+00:00 -0.013752 -0.031110 0 0.0 0 0.019075
2018-02-28 09:00:00+00:00 0.003586 -0.027524 0 0.0 0 0.019075
2018-02-28 10:00:00+00:00 -0.002150 -0.029674 0 0.0 0 0.019075
2018-02-28 11:00:00+00:00 0.001052 -0.028622 0 0.0 0 0.019075
2018-02-28 12:00:00+00:00 -0.007367 -0.035988 1 0.0 1 0.019075
2018-02-28 13:00:00+00:00 0.007182 -0.028807 0 1.0 0 0.026257
2018-02-28 14:00:00+00:00 0.005074 -0.023733 0 0.0 0 0.026257
2018-02-28 15:00:00+00:00 -0.001877 -0.025609 0 0.0 0 0.026257
2018-02-28 16:00:00+00:00 -0.004724 -0.030334 1 0.0 1 0.026257
2018-02-28 17:00:00+00:00 0.002844 -0.027489 1 1.0 1 0.029101
2018-02-28 18:00:00+00:00 -0.000323 -0.027813 1 1.0 1 0.028778
2018-02-28 19:00:00+00:00 0.002457 -0.025356 0 1.0 0 0.031235
2018-02-28 20:00:00+00:00 0.001669 -0.023687 0 0.0 0 0.031235
2018-02-28 21:00:00+00:00 -0.003191 -0.026878 0 0.0 0 0.031235
2018-02-28 22:00:00+00:00 -0.008936 -0.035814 0 0.0 0 0.031235
2018-02-28 23:00:00+00:00 -0.010710 -0.046523 0 0.0 0 0.031235
2018-03-01 00:00:00+00:00 0.005422 -0.041101 0 0.0 0 0.031235
2018-03-01 01:00:00+00:00 0.001883 -0.039219 0 0.0 0 0.031235
2018-03-01 02:00:00+00:00 -0.003429 -0.042648 0 0.0 0 0.031235
2018-03-01 03:00:00+00:00 0.002891 -0.039756 0 0.0 0 0.031235
2018-03-01 04:00:00+00:00 0.002894 -0.036862 0 0.0 0 0.031235
2018-03-01 05:00:00+00:00 -0.000245 -0.037107 1 0.0 1 0.031235
2018-03-01 06:00:00+00:00 0.000909 -0.036198 1 1.0 1 0.032144
2018-03-01 07:00:00+00:00 0.005077 -0.031121 1 1.0 1 0.037220
2018-03-01 08:00:00+00:00 0.005512 -0.025609 1 1.0 1 0.042732
2018-03-01 09:00:00+00:00 -0.003590 -0.029200 1 1.0 1 0.039142
2018-03-01 10:00:00+00:00 0.006731 -0.022468 0 1.0 0 0.045873
2018-03-01 11:00:00+00:00 -0.001932 -0.024400 0 0.0 0 0.045873
2018-03-01 12:00:00+00:00 -0.006420 -0.030820 0 0.0 0 0.045873
2018-03-01 13:00:00+00:00 0.003965 -0.026855 0 0.0 0 0.045873
2018-03-01 14:00:00+00:00 -0.001397 -0.028252 0 0.0 0 0.045873
2018-03-01 15:00:00+00:00 -0.004006 -0.032258 0 0.0 0 0.045873
2018-03-01 16:00:00+00:00 0.004606 -0.027651 0 0.0 0 0.045873
2018-03-01 17:00:00+00:00 0.008039 -0.019612 0 0.0 0 0.045873
2018-03-01 18:00:00+00:00 0.004560 -0.015052 0 0.0 0 0.045873
2018-03-01 19:00:00+00:00 -0.004961 -0.020013 0 0.0 0 0.045873
2018-03-01 20:00:00+00:00 0.004642 -0.015371 0 0.0 0 0.045873
2018-03-01 21:00:00+00:00 -0.006523 -0.021894 0 0.0 0 0.045873
2018-03-01 22:00:00+00:00 -0.001298 -0.023192 0 0.0 0 0.045873
2018-03-01 23:00:00+00:00 -0.001288 -0.024481 0 0.0 0 0.045873
2018-03-02 00:00:00+00:00 0.000288 -0.024193 0 0.0 0 0.045873
2018-03-02 01:00:00+00:00 0.004054 -0.020139 0 0.0 0 0.045873
2018-03-02 02:00:00+00:00 0.000023 -0.020116 0 0.0 0 0.045873
2018-03-02 03:00:00+00:00 -0.002008 -0.022124 0 0.0 0 0.045873
2018-03-02 04:00:00+00:00 -0.002069 -0.024193 0 0.0 0 0.045873
2018-03-02 05:00:00+00:00 0.001104 -0.023089 0 0.0 0 0.045873
2018-03-02 06:00:00+00:00 0.002124 -0.020965 0 0.0 0 0.045873
2018-03-02 07:00:00+00:00 -0.003240 -0.024204 0 0.0 0 0.045873
2018-03-02 08:00:00+00:00 0.004180 -0.020024 0 0.0 0 0.045873
2018-03-02 09:00:00+00:00 -0.011595 -0.031620 0 0.0 0 0.045873
2018-03-02 10:00:00+00:00 -0.002612 -0.034232 0 0.0 0 0.045873
2018-03-02 11:00:00+00:00 -0.000581 -0.034813 0 0.0 0 0.045873
2018-03-02 12:00:00+00:00 0.003622 -0.031191 0 0.0 0 0.045873
2018-03-02 13:00:00+00:00 -0.000429 -0.031620 0 0.0 0 0.045873
2018-03-02 14:00:00+00:00 -0.011532 -0.043152 0 0.0 0 0.045873
2018-03-02 15:00:00+00:00 0.000867 -0.042284 0 0.0 0 0.045873
2018-03-02 16:00:00+00:00 0.000679 -0.041605 0 0.0 0 0.045873
2018-03-02 17:00:00+00:00 -0.000375 -0.041980 0 0.0 0 0.045873
2018-03-02 18:00:00+00:00 -0.001313 -0.043292 0 0.0 0 0.045873
2018-03-02 19:00:00+00:00 -0.004679 -0.047972 1 0.0 1 0.045873
2018-03-02 20:00:00+00:00 0.007713 -0.040259 0 1.0 0 0.053586
2018-03-02 21:00:00+00:00 0.004620 -0.035639 0 0.0 0 0.053586
2018-03-02 22:00:00+00:00 -0.000699 -0.036338 0 0.0 0 0.053586
2018-03-02 23:00:00+00:00 -0.003746 -0.040084 1 0.0 1 0.053586
2018-03-03 00:00:00+00:00 0.003105 -0.036979 1 1.0 1 0.056691
2018-03-03 01:00:00+00:00 0.007524 -0.029454 0 1.0 0 0.064216
2018-03-03 02:00:00+00:00 -0.001076 -0.030531 0 0.0 0 0.064216
2018-03-03 03:00:00+00:00 -0.004085 -0.034615 0 0.0 0 0.064216
2018-03-03 04:00:00+00:00 0.003135 -0.031481 0 0.0 0 0.064216
2018-03-03 05:00:00+00:00 -0.002065 -0.033546 0 0.0 0 0.064216
2018-03-03 06:00:00+00:00 0.001010 -0.032536 0 0.0 0 0.064216
2018-03-03 07:00:00+00:00 -0.004349 -0.036885 0 0.0 0 0.064216
2018-03-03 08:00:00+00:00 0.003792 -0.033093 0 0.0 0 0.064216
2018-03-03 09:00:00+00:00 -0.001580 -0.034673 0 0.0 0 0.064216
2018-03-03 10:00:00+00:00 0.000953 -0.033720 0 0.0 0 0.064216
2018-03-03 11:00:00+00:00 0.001045 -0.032675 0 0.0 0 0.064216