CQF June 2023 Intake: Final Project

Deep Learning for Asset Prediction

Will Colgate, Singapore, January 2024

Introduction¶

Using machine and deep learning techniques to predict price movements is considered the holy grail of modern finance, with great focus from both individual professionals & hobbyists, to large multinational businesses, with mixed results. Given the stochastic nature of returns, it is arguable if a reliable algorithm can be found that is actually effective in the "wild".

Operational complexities such as timely access to information, the brokerage spread and transaction costs make the task difficult before even considering the more philosophical question on how perfect the markets are. This is ratcheted up a notch with the irrational movements of crypto markets, which arguably have no underlying inherent value. Of course, this depends on who you ask. What is true is that crypto markets are very volatile, offering opportunities for large gains for those brave enough to take the risk.

Applying deep learning to these markets, whilst offering an interesting academic problem to explore, are unlikely to offer additional insight and the below should not be used as a basis for any investment decisions.

A Note on the Project Workflow¶

The approach to this problem will follow the well trodden machine learning workflow as follows:

Problem statements
Data collection
Exploratory data analysis
Data cleaning
Feature scaling and selection
Model design and hypertuning
Model evaluation

Given the subject matter is financial timeseries forecasting, the report will also include backtesting of the predictions versus a long only hold strategy, to understand if it achieves its objective of outperforming the market.

A full process diagram of how the problem was approached and the mdoel built is included in Appendix 1.

Problem Statement¶

The objective is to produce a model that can predict positive moves using Long Short-Term Memory (LSTM) networks in short term financial time series.

I have chosen Ethereum (ETH) as the ticker to analyse (technically a pair with USD). Crypto markets are notoriously volatile and it seems like a decent challenge to try and tease some insight out of the mess.

For this purpose, I will aim to predict an hourly positive return. Defining a positive return is discussed in more detail as part of the labels section. This will be a binary classification problem with 1 being the label for a positive move and 0 otherwise.

Whilst accuracy of the predictions will be an important metric, precision, recall and F1 will arguably be more important as measures of success. The precision of calculating the upward moves in particular would appear to be important as there is a realised financial loss from buying and selling at a loss in a high frequency setting. A thorough discussion of metrics is considered in the following pages.

Data Collection¶

Raw Data¶

Access to data is one of the biggest problems for deep learning problems. The amount of data required to train a (good) deep learning neural network is usually much more than is available outside of a professional setting. High frequency intraday data especially is difficult to come by, presumably due to the differences in exchanges, the cost of storage and how valuable the data can be.

After exploring Yahoo Finance (via the yfinance python package) and the Alpha Vantage API, it became apparent that these sources did not have the amount of data required or reliable data.

In the end, the data was sourced from https://www.cryptodatadownload.com/data/ from the data available from the Gemini exchange.

No description has been provided for this image

The data appears to be relatively complete on an arbitrary inspection and shows the meteoric rise of the crypto markets generally in 2021 and 2022 followed by the collapse in price due to the FTX and LUNA scandals. The raw data goes back to 2016, but truncated in the above chart for the reasons explained in the next few paragraphs.

The crypto market is notoriously emotion driven. Even glancing at social media or news outlets allows a person to gain a sense of how this is true. It follows then that some kind of sentiment regarding this emotional investing would potentially give some interesting insight into the problem statement. There is an interesting resource updated daily on alternative.me called the fear and greed index.

The index takes a weighted approach to a number of factors across 5 (formally, 6) data sources. A numerical value is assigned which falls into categories of:

Extreme Fear
Fear
Neutral
Greed
Extreme Greed

The index is updated daily at 00:00 UTC.

	fg_value	fg_value_classification
timestamp
2023-12-23 00:00:00+00:00	70	greed
2023-12-22 00:00:00+00:00	74	greed
2023-12-21 00:00:00+00:00	70	greed
2023-12-20 00:00:00+00:00	74	greed
2023-12-19 00:00:00+00:00	73	greed

The index could be a good indicator of sentiment in the crypto market as a whole. Crypto tokens do not have fundamental data so a traditional fundamental analysis cannot be undertaken. However, there are metrics associated with blockchains that can be accessed (such as transactions per second, blocks mined etc) that could be worth exploring in a further analysis but are outside the scope of this paper.

The index only began on 1 February 2018 so all price data before this date has been dropped and the the daily metric forward filled to the hourly data. This would reflect that the index would apply to the price movements throughout the day.

Feature Engineering¶

Feature engineering is the catch all term for using domain knowledge to generate insights from the raw dataset. Common data transformations for financial times series are known as technical analysis with associated literature that spans many volumes.

Using pandas-ta, I have generated standard technical indicators for the data based on high, low, open, close and volume. I have also generated temporal data to investigate if there is any kind of seasonality to returns. I have used the scikit-learn OneHotEncoder to encode these values (and the FG classification) into binary variables.

Interestingly, the pandas-ta module implements TA-Lib and candlestick patterns. Reading candlesticks is a classical version of technical analysis, originating from the rice markets of Japan. Recognising what patterns lead to an upward tick will be interesting to see.

Labelling the Data¶

Given the problem statement is to predict an hourly positive return. The 1 period return is calculated as follows, where $p$ is the closing price:

$$ r_t = ln(\frac{p_t}{p_{t-1}}) $$

A practical approach to predicting a positive return for these purposes would be any net return (i.e. after transaction costs).

Here are the fees from the Gemini exchange for reference. The taker fee at the lowest volume per month is 0.4%. To account for interest on margin, I will round this up to 0.5% as an estimate.

Therefore, a label of 1 will mean that that the upward return in the next hour will be greater than 0.05% and 0 otherwise. Mathematically:

$$ y_t = \begin{cases} 1 & \quad \text{if } r_{t+1} > 0.005\\ 0 & \quad \text{otherwise} \end{cases} $$

Exploratory Data Analysis¶

EDA is an important step in any machine learning workflow. Inititial analysis of the data after engineering the technical indicators indicated that some needed to be removed. This is discussed more in the cleaning section.

Some observations of the above:

A simple histogram of the the return and the fg_value indicates, as epected, that the daily returns are clustered around 0 but there are some significant outliers and high peaks.
The fear and greed value is distributed towards the lower end, possibly indicating that overall, fear dominated the crypto market in the time period in question.

Some observations from the above data:

Return does not seem to be correlated with fg_value. However, higher fg_values (i.e. more greed in the index), seems to see the variance decrease (heteroscedasticity).
There appears to be a slight negative correlation with fg_value and volume, indicating that less volume is traded during times of greed in the index.
Return does not have significant outliers but volume does look like it would be a candidate for robust scaler.

label
0    40858
1     9809
Name: count, dtype: int64

The above does indicate that there is quite a severe class imbalance in the data that will need to be addressed at the model building stage otherwise the model will likely underperform due to bias.

Next, I have examined the correlation between features. Collinearity between features is present and there are a number of features that will need cleaning before they can be used in the model.

The heatmap of cleaned features shows clear collinearity between a number of features. Collinearity in ML problems affects performance and interpretability and so is generally best removed. There are multiple methods of doing this but in this paper, I have focused on the following two methods:

Only retaining the first variable in a highly correlated pair
Discarding variables with a variance inflation factor of greater than 5

The first method is self explanatory, the second is defined as:

$$ VIF_i = \frac{1}{1-R^{2}_{i}} $$

Where $R^{2}_{i}$ is the unadjusted coefficient of determination for regressing variable $i$ on all the remaining independent variables. A VIF equal to 1 indicates that the variables are not correlated, between 1 and 5, that there is moderate correlation between that variable and others and greater than 5 indicating high correlation with other variables.

Data Cleaning¶

As already alluded to in the proceeding sectons, the joining of the two raw data sources and computation of the technical indicators requires some cleaning. A number of steps have been taken to deal with this in the clean_scale.py script. These are:

Drop all columns where pandas.ta has calculated NaN.
Drop columns where 20,000 datapoints (out of 65k+) are missing.
Remove the leading NaN rows of data due to calculation of rolling amounts (simple moving averages etc.).
Removal of all columns with no variance, indicating a single value. This is because no variance in the data indicates no informative information for the algorithm.

After this process, there are 324 features remaining in the dataset with hourly data from 26/02/2018 until 11/12/2023.

Feature Scaling¶

In order to get the best results from deep learning models, data generally needs to be scaled to aid in faster calculation of cost functions during gradient descent. There are various scaling techniques commonly used but this paper concentrates on two.

Min Max Scaler¶

The min max scaler rescales all features to within a range based on the following calculation:

$$ x_{scaled} = \frac{x_i - min(x)}{max(x) - min(x)} $$

This scaler is relatively sensitive to outliers but is generally good on many financial time series problems.

Robust Scaler¶

The Robust Scaler scales variables with significant outliers by using the quartiles of $x$ to scale the variables.

$$ x_{scaled} = \frac{x_i - Q_{1}(x)}{Q_{3}(x) - Q_{1}(x)} = \frac{x_i - Q_{1}(x)}{IQR(x)} $$

A significant outlier is calculated for the purposes of this paper as one that is 10 times the IQR. Appendix 2 lists each feature remianing after cleaning the data and the scaler applied to each.

Split Data into Train and Test¶

Before applying the chosen sclaing methods to each column, the data needs to be split into train and test data. This is because the scaling algorithm will be fit to the training data only and the test data scaled using metrics calculated on the training data only. This technique helps to avoid data leakage of the test data into the training dataset and also helps with regularization. Note that as this is time series data, the data should not be shuffled.

Once the train and test data are split, the train data is split again into a train and validation set test the mdoel during the training process.

All sets are then scaled in clean_scale.py using $x$ values calculated from the training set only to avoid any data leakage.

Feature Selection¶

There are many (and varied) techniques to feature selection and feature engineering as a whole, necessitating experimentation to try and meet the main objective of an efficient model with good predictive powers.

To attempt to achieve this, I have split the problem into four stages:

Removal of collinearity using one of the two techniques described above.
Use of boruta as a feature selection algorithm.
Dimension reduction using Uniform Manifold Approximation & Projection (UMAP), a relatively new and novel unsupervised learning algorithm.
Input the results of the above pipeline (or part there of) into a baseline one layer LSTM model using keras and anlyse the results.

The best performing pipeline above will be chosen to test other model architectures.

Boruta¶

The Boruta algorithm is designed around a random forest classifier. It seeks to establish what features contribute to the overall model. It does this by duplicating and shuffling the dataset into "shadow features". The classifier (in this case, a random forest) is then trained on both sets of data. Feature importance is compared to the shadow features. If the feature has a greater importance than its shadow equivalent, then the feature is retained.

The algorithm is implemented in Python through the boruta_py package. in order to capture as much data as possible, the parameter "perc" was set at 90 in line with the documentation so as to avoid too "strict" of an interpretaion of importance.

UMAP¶

Per the documentation, UMAP is a dimension reduction technique that can be used for visualisation similarly to t-distributed Stochastic Neighbor Embedding (t-SNE), but also for general non-linear dimension reduction. The mathamatics can be found in McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. According to the literature, UMAP offers significant performance improvements over other dimensionality techniques such as t-SNE and SOM. PCA was discounted in this analysis as it does not tend to work well on non-linear data.

The python implementation of UMAP takes a number of parameters but there are two main parameters that impact the clustering of the algoithm on the 2D plane.

Hyperparameter	Description	Value
n_neighbors	This parameter controls how UMAP balances local versus global structure in the data. It does this by constraining the size of the local neighborhood UMAP will look at when attempting to learn the manifold structure of the data. Lower values of n_neighbours force the algorithm to focus more on the local structure, potentailly losing some of the global structure and vice versa.	10
min_dist	This parameter controls how tightly UMAP is allowed to pack points together is the 2D representation. A lower min_dist generally means that points will clump together more. The choice of min_dist will depend on the use case of the algorithm, with a lower value generally being more useful for clustering problems.	0.1 (default)

Appendix 3 explores the output of the algorithm on changing the above parameters and demonstrates why the above were chosen as a middle ground on local and global structure to use for dimension reduction of the dataset.

Baseline Model¶

In order to evaluate the various pipelines to understand the best set of features for this particular problem statement, a baseline deep learning model was used. This consisted on a single layer with 36 units and 'relu' activation, being:

$$ x^+ = \begin{cases} x & \quad \text{if } x > 0\\ 0 & \quad \text{otherwise} \end{cases} $$

The past 6 hours (i.e. current and preceeding 5) of data was used a sequence length on the basis that this seemed like a reasonable amount of time to use in predicting an up movement in 1 hour's time but also, it is a good middle point in terms of performance. The model structure is detailed in the below diargram.

The model input is a 3 dimentsional tensor in the form (batch, sequence length, features). The output of the dense layer is a $p$ prediction that the output will be 1 such that:

$$ \text{prediction} = \begin{cases} 1 & \quad \text{if } p > 0.5\\ 0 & \quad \text{otherwise} \end{cases} $$

The optimizer used is Adam (see Kingma et al. 2014) and the loss function is binary cross entropy, being the most appropriate loss function for evaluating binary classification problems.

Before training, the class imbalance was dealt with by assigning a weight to each class. The weights are calculated by taking 1 divided by the count of the class multiplied by the total length of the array divided by 2. The resultant python dictionary can be used in the keras Model class as the "class_weights" parameter.

An early stopping callback was used in the model monitoring the validation loss (i.e. the model will cease running 10 epochs after the validation loss has ceased decreasing).

Experiment Results¶

	max_binary_accuracy	max_binary_accuracy_epoch	max_precision	max_precision_epoch	max_recall	max_recall_epoch	max_f1	max_f1_epoch
run_name
run_pairwisecorr__boruta_01-14-2024-14:16:04	0.698578	13	0.392012	13	0.885132	3	0.484584	17
run_boruta_01-14-2024-14:40:32	0.581991	4	0.329438	4	0.906438	1	0.468319	4
run_vif_01-14-2024-14:23:39	0.641601	28	0.342770	28	0.928208	0	0.466315	16
run_vif__boruta_01-14-2024-14:31:03	0.583992	34	0.324927	34	0.858268	4	0.457744	2
run_vif__umap_01-14-2024-14:50:45	0.656767	11	0.345592	11	0.781843	0	0.435312	9
run_pairwisecorr_01-14-2024-14:15:30	0.562085	0	0.274124	0	0.930523	9	0.403786	16
run_all_01-14-2024-14:14:58	0.706477	2	0.353408	2	0.963872	5	0.400226	18
run_boruta__umap_01-14-2024-14:45:33	0.674882	7	0.332054	9	0.907365	0	0.391881	1
run_pairwisecorr__boruta__umap_01-14-2024-14:19:45	0.718483	10	0.374512	10	0.358962	2	0.365049	2
run_umap_01-14-2024-14:50:18	0.688046	10	0.260012	10	0.554887	1	0.331351	1

An F1 score has been calculated at each epoch using the precision and recall. The F1 metric attemps to evaluate the model on its class-wise performance and is the harmonic mean of the precision and recall scores. Mathematically:

$Precision = \frac{TP}{TP + FP}$

$Recall = \frac{TP}{TP + FN}$

$Precision = 2 \times \frac{Precision \times Recall}{Precision + Recall}$

where

$TP = \text{True Positive}$

$FP = \text{False Positive}$

$FN = \text{False Negative}$

For a given class:

A high precision and a high recall - The class has been well managed by the model
High precision and low recall - The class is not well detected but when it is, the model is very reliable.
Low precision and high recall - The class is well detected, but also includes observations of other classes.
Low precision and low recall - the class has not been handled well at all

The use of F1 is due to the fact that precision and recall are often antagonistic. The F1 score measures both precision and recall in one measure. The higher the F1 score, the better the overall model in classifying both classes.

The F1 score has been used as the principal metric on which to evaluate the model as it offers a good "all encompassing" metric where neither precision nor recall are obviously more important.

Pipeline Selection¶

From the above results, the preprocessing that gives the highest F1 is pairwise reduction in correlation followed by using Boruta to assess the remaining features. This is the preprocessing pipeline I have adopted for the remainder of this report.

Deep Learning Model¶

A Sensible Baseline to Beat¶

In order to judge if the model is achieving its stated objective, a sensible baseline to beat should be established. Given the class counts above, predicting a 0 each time would result in an accuracy for 0.8064 (if not any chance of a profit as an investor would never take any risk). This would result in a precision and recall on class 1 of 0, which should be beatable....

If we can approach this sensible baseline, then a backtest would determine if a strategy based on these signals would result in any profit, over and above a long hold.

Tested Models¶

In line with the requirments of the project, LSTM models have been used to attempt the classification problem. An LSTM layer is a type of recurrant neaural network that enable the learning of long term dependecneis. They were first proposed in Hochreiter & Schmidhuber (1997) and refined since then. The model structure is included below.

Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

The key to the LSTM's is the cell state which is represented by the horizontel line in the above diagram. This line containes the prior data and passes a number of gates that have the ability to add or remove information. The models are widely used in a number of machines learning problems.

For the basis of this paper, 1, 2 and 3 layer LSTM's were considered when evaluating the predictive power of the model. The additional layers add complexity and computing time to the model and this needs to be weighed with the predictive power.

Dropout layers, whilst useful for regualrization of other ML problems, are known to hinder learning in RNNs. In line with Gal & Ghahramani (2016), the recurrent dropout parameter was used to introduce dropout into the models. This parameter uses the same dropout mask on each unit. Using the same dropout msak at every timestep allows the network to properly propagate its learning error throughout time whereas a temporally random dropout would disrupt this error (Deep Learning with Python, Chollet).

Each of these models was tested using 4 lookback periods, being 1, 6, 12 and 24 hours. The results of the experiments are presented below.

Experiment Results¶

	max_binary_accuracy	max_binary_accuracy_epoch	max_precision	max_precision_epoch	max_recall	max_recall_epoch	max_f1	max_f1_epoch
run_name
baseline_24_hour	0.712251	7	0.405663	7	0.862117	5	0.493919	11
two_layer_dropout_24_hour	0.658436	18	0.372197	18	0.912256	1	0.493506	18
two_layer_12_hour	0.686268	22	0.388934	22	0.906858	1	0.490676	22
three_layer_dropout_12_hour	0.623353	11	0.353416	11	0.910565	9	0.488552	11
baseline_dropout_24_hour	0.688720	17	0.389321	17	0.896472	24	0.486957	17
two_layer_dropout_6_hour	0.642654	30	0.359836	30	0.935618	0	0.482853	30
three_layer_12_hour	0.641058	25	0.358053	25	0.933735	1	0.480317	25
baseline_12_hour	0.616503	16	0.345375	16	0.886932	7	0.476177	16
baseline_dropout_6_hour	0.612428	28	0.342449	28	0.900417	0	0.475862	20
three_layer_dropout_6_hour	0.609689	38	0.341398	38	0.942103	1	0.473280	38
two_layer_6_hour	0.638968	6	0.353903	6	0.908754	12	0.472778	6
two_layer_24_hour	0.645141	28	0.354513	28	0.902971	0	0.470426	29
two_layer_dropout_12_hour	0.580883	23	0.327153	31	0.900371	1	0.468639	31
three_layer_6_hour	0.583886	10	0.329333	10	0.911070	0	0.466730	10
baseline_dropout_12_hour	0.579408	25	0.327369	25	0.926784	6	0.465515	25
baseline_6_hour	0.600527	21	0.332238	21	0.902733	0	0.463391	17
baseline_1_hour	0.626526	23	0.340106	23	0.874711	3	0.461673	25
baseline_dropout_1_hour	0.629579	24	0.342326	24	0.874711	3	0.461654	18
three_layer_dropout_1_hour	0.565895	5	0.321109	5	0.926491	0	0.460492	5
three_layer_1_hour	0.565579	5	0.320999	5	0.926491	0	0.460452	5
two_layer_dropout_1_hour	0.515579	0	0.299655	0	0.917245	3	0.443265	8
two_layer_1_hour	0.515579	0	0.299655	0	0.917245	3	0.442980	8
three_layer_24_hour	0.564419	2	0.281820	4	0.972609	9	0.410888	14
three_layer_dropout_24_hour	0.475150	2	0.250718	6	0.991643	19	0.387369	6

The above indicates that the baseline model with a look back period of 24 hours performed the best in terms of accuracy and the F1 statistic (on the validation data). This model structure will be selected as the final model for hyperparameter tuning.

Hypertuning Strategy¶

Having determined the model structure that results in the highest F1 score, the hyperparameters of the model are tuned in order to try and improve performance. Using Keras Tuner for this task, the first decision is the choice in tuning algorithm.

Tuner	Description
RandomSearch	Chooses hyperparameters at random. Computationally expensive and completely random (as the name suggests) in finding a best set of hyperparameters.
GridSearch	Similar to the grid search in Scikit Learn, this tuner attempts all possible combination of hyperparameters to find the best. Again, computationally (very) expensive.
BaysianOptimization	Using a Baysian approach to optimization with a Gaussian distribution.
Hyperband	The Keras Tuner implementation of Hyberband. This is a novel bandit based approach to optimization based on the paper by Li, Lisha, and Kevin Jamieson, "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.

According to the algorithm's authors, Hyberband offers performance advantages over all the other available tuning algorithms. This is the tuner I have used to tune the model.

The parameters to be tuned are:

| Parameter | Description | |---------------------|--------------------------------------------------------------------------------------------------------------------------------| | Units is LSTM layer | The number of units for the LSTM layer | | Activation function | This is a choice of either relu, elu, tanh, sigmoid, selu | | Learning rate | The learning rate of the gradient descent. A float between 0.0005 and 0.01 | | Beta 1 | The exponential decay rate for the first moment estimates. A float between 0.5 and 0.99 | | Beta 2 | The exponential decay rate for the second-moment estimates. Generally should be set close to 1. A float between 0.5 and 0.9999 |9999 |

Given the validation loss of the model seems to minimise at around epoch 7, the max epochs parameter for the tuner was set at 15. This would allow enough expansion for testing hyperparameters without being overly wastful on resources.

After the tuning process, trial 28 was chosen and the final parameters were:

Parameter	Value
Units	400
Activation	tanh
Learning rate	0.00058215

Both betas were tuned but lost in the output when generating the model. All the results of the hypertuning can be found in Appendix 4.

Final Model¶

Model Evaluation¶

To evaluate the model, it is trained on the test data which has been held back. The scaled data is reimported and put through the pairwise correlation and boruta pipeline. The pipeline is converted to a tensorflow Dataset using the timeseries utility function, giving the same dataset as used in the experimentations designed above.

	retained_feature
0	open
1	high
2	ABER_XG_5_15
3	ACCBL_20
4	ACCBU_20
5	ADX_14
6	AGj_13_8_5
7	ALPHAT_14_1_50
8	OBV_min_2
9	OBV_max_2
10	AROONOSC_14
11	BBU_5_2.0
12	BIAS_SMA_26
13	BOP
14	PIVOTS_TRAD_D_R1
15	UO_7_14_28

It is interesting to note when examining the retained features that the fear and greed index data and seasonality data has been dropped. This indicates that these features may need some additional transformations in order to have a significant impact on the model or that they are not as useful in predicting short term movements as other features.

It is also interesting to note that the candlestick patterns are not included either. There goes hundreds of years of Japanese know how!

The remaining features seem to be a good mix of raw data, momentum indicators and trend. The function names can be found at https://ta-lib.org/functions/ and there is volumes of information on their definition and usage available online.

              precision    recall  f1-score   support

           0       0.94      0.81      0.87     11169
           1       0.31      0.64      0.42      1475

    accuracy                           0.79     12644
   macro avg       0.63      0.73      0.64     12644
weighted avg       0.87      0.79      0.82     12644

The above indicates quite encouraging results for the model when it is tested on test data. The model predicts the 0 class well but struggles more with class 1. For the use case for this model whereby a person would use these signals to enter or exit a trade, it could be useful as any incorrect position would quickly be exited with a correct prediciton of class 0.

The weighted avereage F1 score also indicates a model with decent predictive power. The results are quite encouraging!

In constructing the confusion matrix, the values have been normalised due to the class imbalance. This generally gives a better understanding of how the model is predicting each class in these cases.

The results are promising, indicating that the model has a relatively small proportion of false positives and false negatives. As false positives are more likely to result in actual loss of investment value, this is encouraging. The model appears to perform relatively well, all things considered.

$\text{True Positive Rate} = Recall$

$\text{False Positive Rate} = \frac{False Postive}{Trune Negative + False Positive}$

The red line on the above digram represents the situation where the true positive rate is equal to the false positve rate. Points above this line indicate where the proportion of of correctly classified points belonging to the positive class is greater than the proportion of incorrectly classified points belonging to the negative class.

A perfect model that correctly classified everything, would have the elbow point at the coordinates (0, 1). The area under the curve (AUC) provides an aggregate measure of performance across all possible classification thresholds. An AUC above 0.5 indicates that the model has some predictive power of a random choice. In short, the greater the gradient of the ROC curve, the better the predictive power of the model.

A 0.73 is not bad and indicates that the model has moderate predictive power which will be tested more in the backtesting section.

All in all, the model appears to be well regularized and performs well on the test data. Finally, a backtest needs to be conducted to simulate how the predictions perform as signals for trading decisions.

Backtesting¶

Approaches to backtesting can vary but I have implemented a long only strategy. This will be conpared to a long hold from the beginning of the period.

A few assumption have been made when considering the strategy:

That the minimum holding period will be for the duration of the timestep (i.e. 1 hour in this case)
That a long position can be taken in the security at the close price from the previous step, thus locking in the maximum return
That the investor operating the strategy can borrow for free or has access to cash to continue taking positions when all funds are lost

These assumptions are not appropriate is a real market situation and should be considered further should this strategy need to be used in production.

Wow! I'm going to be a billionaire!

But seriously, the backtesting does appear to indicate that the model is very good at generating a consistent return. However, significantly more testing would need to be undertaken before using the model 'in the wild' with real time market data, order books and real money. The above indicates that there may be some data leakage in the model that should be investigated further as part of any further analysis

Arguably, the reason for the good backtest is that downward movements are predicted very well (0.94 precision) meaning any mistakes are quickly rectified. Combined with how generally small hourly returns are, any incorrect position would quickly be closed out. Transaction costs would need to be build into any production model to better model if the strategy would work.

The first 100 results of the strategy can be found in Appendix 5.

Conclusions¶

The final model failed to beat the sensible baseline by 1.64%. However, the confusion matrix and AUC metrics indicate the model performs well in predicting true positive and true negative, indicating a goog level of performance. Also, it did do relatively well when real investment actions were simulated (subject to further testing) so the model could warant further analysis, incorporating more real world trading conditions.

It is interesting to note that the fear and greed index ultimately had no impact on the analysis, potentially indicating that there is value in ignoring the noise and sticking to a dispassionate strategy.

Thank you for reading.

Appendices¶

Appendix 1 - Process Diagram¶

Appendix 2 - Features and Scalers¶

	feature
scaler
RobustScaler	ABER_ATR_5_15
MinMaxScaler	ABER_SG_5_15
MinMaxScaler	ABER_XG_5_15
MinMaxScaler	ABER_ZG_5_15
MinMaxScaler	ACCBL_20
MinMaxScaler	ACCBM_20
MinMaxScaler	ACCBU_20
MinMaxScaler	AD
RobustScaler	ADOSC_3_10
MinMaxScaler	ADXR_14_2
MinMaxScaler	ADX_14
MinMaxScaler	AGj_13_8_5
MinMaxScaler	AGl_13_8_5
MinMaxScaler	AGt_13_8_5
MinMaxScaler	ALMA_9_6.0_0.85
MinMaxScaler	ALPHAT_14_1_50
MinMaxScaler	ALPHATl_14_1_50_2
MinMaxScaler	AMATe_LR_8_21_2
MinMaxScaler	AMATe_SR_8_21_2
MinMaxScaler	AOBV_LR_2
MinMaxScaler	AOBV_SR_2
RobustScaler	AO_5_34
RobustScaler	APO_12_26
MinMaxScaler	AROOND_14
MinMaxScaler	AROONOSC_14
MinMaxScaler	AROONU_14
MinMaxScaler	AR_26
MinMaxScaler	ATRTSe_14_20_3.0
RobustScaler	ATRr_14
RobustScaler	BBB_5_2.0
MinMaxScaler	BBL_5_2.0
MinMaxScaler	BBM_5_2.0
RobustScaler	BBP_5_2.0
MinMaxScaler	BBU_5_2.0
RobustScaler	BEARP_13
RobustScaler	BIAS_SMA_26
MinMaxScaler	BOP
MinMaxScaler	BR_26
RobustScaler	BULLP_13
MinMaxScaler	CCI_14_0.015
MinMaxScaler	CDL_3WHITESOLDIERS
MinMaxScaler	CDL_ADVANCEBLOCK
MinMaxScaler	CDL_BELTHOLD
MinMaxScaler	CDL_CLOSINGMARUBOZU
MinMaxScaler	CDL_DOJI_10_0.1
MinMaxScaler	CDL_DRAGONFLYDOJI
MinMaxScaler	CDL_GRAVESTONEDOJI
MinMaxScaler	CDL_HAMMER
MinMaxScaler	CDL_HANGINGMAN
MinMaxScaler	CDL_HIGHWAVE
MinMaxScaler	CDL_HIKKAKE
MinMaxScaler	CDL_HIKKAKEMOD
MinMaxScaler	CDL_IDENTICAL3CROWS
MinMaxScaler	CDL_INSIDE
MinMaxScaler	CDL_LONGLEGGEDDOJI
MinMaxScaler	CDL_LONGLINE
MinMaxScaler	CDL_MARUBOZU
MinMaxScaler	CDL_MATCHINGLOW
MinMaxScaler	CDL_RICKSHAWMAN
MinMaxScaler	CDL_SEPARATINGLINES
MinMaxScaler	CDL_SHORTLINE
MinMaxScaler	CDL_SPINNINGTOP
MinMaxScaler	CDL_STALLEDPATTERN
MinMaxScaler	CDL_TAKURI
RobustScaler	CFO_9
RobustScaler	CG_10
MinMaxScaler	CHDLREXTd_22_22_14_2.0
MinMaxScaler	CHDLREXTl_22_22_14_2.0
MinMaxScaler	CHDLREXTs_22_22_14_2.0
MinMaxScaler	CHOP_14_1_100.0
MinMaxScaler	CKSPl_10_3_20
MinMaxScaler	CKSPs_10_3_20
MinMaxScaler	CMF_20
MinMaxScaler	CMO_14
MinMaxScaler	COPC_11_14_10
MinMaxScaler	CRSI_3_2_100
MinMaxScaler	CTI_12
RobustScaler	CUBE_3.0_-1
RobustScaler	CUBEs_3.0_-1
MinMaxScaler	DCL_20_20
MinMaxScaler	DCM_20_20
MinMaxScaler	DCU_20_20
MinMaxScaler	DEC_1
MinMaxScaler	DEMA_10
RobustScaler	DMN_14
RobustScaler	DMP_14
RobustScaler	DPO_20
MinMaxScaler	D_9_3
MinMaxScaler	EBSW_40_10
RobustScaler	EFI_13
MinMaxScaler	EMA_10
RobustScaler	ENTP_10
MinMaxScaler	ER_10
MinMaxScaler	FAMA_0.5_0.05
MinMaxScaler	FISHERT_9_1
MinMaxScaler	FISHERTs_9_1
MinMaxScaler	FWMA_10
MinMaxScaler	HA_close
MinMaxScaler	HA_high
MinMaxScaler	HA_low
MinMaxScaler	HA_open
MinMaxScaler	HILO_13_21
MinMaxScaler	HL2
MinMaxScaler	HLC3
MinMaxScaler	HMA_10
MinMaxScaler	HWL_1
MinMaxScaler	HWMA_0.2_0.1_0.1
MinMaxScaler	HWM_1
MinMaxScaler	HWU_1
MinMaxScaler	ICS_26
MinMaxScaler	IKS_26
MinMaxScaler	INC_1
MinMaxScaler	INERTIA_20_14
MinMaxScaler	INVFISHER_1.0
MinMaxScaler	INVFISHERs_1.0
MinMaxScaler	ISA_9
MinMaxScaler	ISB_26
MinMaxScaler	ITS_9
MinMaxScaler	JMA_7_0.0
MinMaxScaler	J_9_3
MinMaxScaler	KAMA_10_2_30
MinMaxScaler	KCBe_20_2
MinMaxScaler	KCLe_20_2
MinMaxScaler	KCUe_20_2
MinMaxScaler	KST_10_15_20_30_10_10_10_15
MinMaxScaler	KSTs_9
RobustScaler	KURT_30
RobustScaler	KVO_34_55_13
RobustScaler	KVOs_34_55_13
MinMaxScaler	K_9_3
MinMaxScaler	LDECAY_1
MinMaxScaler	LINREG_14
RobustScaler	LOGRET_1
RobustScaler	MACD_12_26_9
RobustScaler	MACDh_12_26_9
RobustScaler	MACDs_12_26_9
RobustScaler	MAD_30
MinMaxScaler	MAMA_0.5_0.05
MinMaxScaler	MASSI_9_25
MinMaxScaler	MCGD_10
MinMaxScaler	MEDIAN_30
MinMaxScaler	MFI_14
MinMaxScaler	MIDPOINT_2
MinMaxScaler	MIDPRICE_2
RobustScaler	MOM_10
RobustScaler	NATR_14
MinMaxScaler	NVI_1
MinMaxScaler	OBV
MinMaxScaler	OBV_max_2
MinMaxScaler	OBV_min_2
MinMaxScaler	OBVe_12
MinMaxScaler	OBVe_4
MinMaxScaler	OHLC4
RobustScaler	PCTRET_1
RobustScaler	PDIST
MinMaxScaler	PGO_14
MinMaxScaler	PIVOTS_TRAD_D_P
MinMaxScaler	PIVOTS_TRAD_D_R1
MinMaxScaler	PIVOTS_TRAD_D_R2
MinMaxScaler	PIVOTS_TRAD_D_R3
MinMaxScaler	PIVOTS_TRAD_D_R4
MinMaxScaler	PIVOTS_TRAD_D_S1
MinMaxScaler	PIVOTS_TRAD_D_S2
MinMaxScaler	PIVOTS_TRAD_D_S3
MinMaxScaler	PIVOTS_TRAD_D_S4
RobustScaler	PPO_12_26_9
RobustScaler	PPOh_12_26_9
RobustScaler	PPOs_12_26_9
MinMaxScaler	PSARaf_0.02_0.2
MinMaxScaler	PSARr_0.02_0.2
MinMaxScaler	PSL_12
MinMaxScaler	PVI_1
RobustScaler	PVOL
MinMaxScaler	PVO_12_26_9
MinMaxScaler	PVOh_12_26_9
MinMaxScaler	PVOs_12_26_9
MinMaxScaler	PVR
MinMaxScaler	PVT
MinMaxScaler	PWMA_10
MinMaxScaler	QQE_14_5_4.236
MinMaxScaler	QQE_14_5_4.236_RSIMA
RobustScaler	QS_10
MinMaxScaler	QTL_30_0.5
MinMaxScaler	REFLEX_20_20_0.04
MinMaxScaler	REMAP_0.0_100.0_-1.0_1.0
MinMaxScaler	RMA_10
RobustScaler	ROC_10
MinMaxScaler	RSI_14
MinMaxScaler	RSX_14
MinMaxScaler	RVGI_14_4
MinMaxScaler	RVGIs_14_4
MinMaxScaler	RVI_14
MinMaxScaler	RWIh_14
MinMaxScaler	RWIl_14
MinMaxScaler	SINWMA_14
MinMaxScaler	SKEW_30
RobustScaler	SLOPE_1
MinMaxScaler	SMA_10
MinMaxScaler	SMI_5_20_5_1.0
MinMaxScaler	SMIo_5_20_5_1.0
MinMaxScaler	SMIs_5_20_5_1.0
MinMaxScaler	SMMA_7
RobustScaler	SQZPRO_20_2.0_20_2.0_1.5_1.0
MinMaxScaler	SQZPRO_OFF
MinMaxScaler	SQZPRO_ON_NARROW
MinMaxScaler	SQZPRO_ON_NORMAL
MinMaxScaler	SQZPRO_ON_WIDE
RobustScaler	SQZ_20_2.0_20_1.5
MinMaxScaler	SQZ_OFF
MinMaxScaler	SQZ_ON
MinMaxScaler	SSF3_20
MinMaxScaler	SSF_20
MinMaxScaler	STC_10_12_26_0.5
RobustScaler	STCmacd_10_12_26_0.5
MinMaxScaler	STCstoch_10_12_26_0.5
RobustScaler	STDEV_30
MinMaxScaler	STOCHFd_14_3
MinMaxScaler	STOCHFk_14_3
MinMaxScaler	STOCHRSId_14_14_3_3
MinMaxScaler	STOCHRSIk_14_14_3_3
MinMaxScaler	STOCHd_14_3_3
MinMaxScaler	STOCHh_14_3_3
MinMaxScaler	STOCHk_14_3_3
MinMaxScaler	SUPERT_7_3.0
MinMaxScaler	SUPERTd_7_3.0
MinMaxScaler	SWMA_10
MinMaxScaler	T3_10_0.7
MinMaxScaler	TEMA_10
RobustScaler	THERMO_20_2_0.5
MinMaxScaler	THERMOl_20_2_0.5
RobustScaler	THERMOma_20_2_0.5
MinMaxScaler	THERMOs_20_2_0.5
MinMaxScaler	TMO_14_5_3
MinMaxScaler	TMOs_14_5_3
MinMaxScaler	TOS_STDEVALL_LR
MinMaxScaler	TOS_STDEVALL_L_1
MinMaxScaler	TOS_STDEVALL_L_2
MinMaxScaler	TOS_STDEVALL_L_3
MinMaxScaler	TOS_STDEVALL_U_1
MinMaxScaler	TOS_STDEVALL_U_2
MinMaxScaler	TOS_STDEVALL_U_3
MinMaxScaler	TRENDFLEX_20_20_0.04
MinMaxScaler	TRIMA_10
MinMaxScaler	TRIX_30_9
MinMaxScaler	TRIXs_30_9
RobustScaler	TRUERANGE_1
MinMaxScaler	TSI_13_25_13
MinMaxScaler	TSIs_13_25_13
RobustScaler	TSV_18_10
RobustScaler	TSVr_18_10
RobustScaler	TSVs_18_10
MinMaxScaler	TTM_TRND_6
RobustScaler	UI_14
MinMaxScaler	UO_7_14_28
RobustScaler	VAR_30
MinMaxScaler	VHF_28
RobustScaler	VHM_610
MinMaxScaler	VTXM_14
MinMaxScaler	VTXP_14
MinMaxScaler	VWAP_D
MinMaxScaler	VWMA_10
MinMaxScaler	WCP
MinMaxScaler	WILLR_14
MinMaxScaler	WMA_10
MinMaxScaler	ZL_EMA_10
MinMaxScaler	ZS_30
MinMaxScaler	close
MinMaxScaler	close_Z_30_1
MinMaxScaler	day_of_week_0
MinMaxScaler	day_of_week_1
MinMaxScaler	day_of_week_2
MinMaxScaler	day_of_week_3
MinMaxScaler	day_of_week_4
MinMaxScaler	day_of_week_5
MinMaxScaler	day_of_week_6
MinMaxScaler	fg_value
MinMaxScaler	fg_value_classification_extreme fear
MinMaxScaler	fg_value_classification_extreme greed
MinMaxScaler	fg_value_classification_fear
MinMaxScaler	fg_value_classification_greed
MinMaxScaler	fg_value_classification_neutral
MinMaxScaler	high
MinMaxScaler	high_Z_30_1
MinMaxScaler	hour_0
MinMaxScaler	hour_1
MinMaxScaler	hour_10
MinMaxScaler	hour_11
MinMaxScaler	hour_12
MinMaxScaler	hour_13
MinMaxScaler	hour_14
MinMaxScaler	hour_15
MinMaxScaler	hour_16
MinMaxScaler	hour_17
MinMaxScaler	hour_18
MinMaxScaler	hour_19
MinMaxScaler	hour_2
MinMaxScaler	hour_20
MinMaxScaler	hour_21
MinMaxScaler	hour_22
MinMaxScaler	hour_23
MinMaxScaler	hour_3
MinMaxScaler	hour_4
MinMaxScaler	hour_5
MinMaxScaler	hour_6
MinMaxScaler	hour_7
MinMaxScaler	hour_8
MinMaxScaler	hour_9
MinMaxScaler	low
MinMaxScaler	low_Z_30_1
MinMaxScaler	month_1
MinMaxScaler	month_10
MinMaxScaler	month_11
MinMaxScaler	month_12
MinMaxScaler	month_2
MinMaxScaler	month_3
MinMaxScaler	month_4
MinMaxScaler	month_5
MinMaxScaler	month_6
MinMaxScaler	month_7
MinMaxScaler	month_8
MinMaxScaler	month_9
MinMaxScaler	open
MinMaxScaler	open_Z_30_1
RobustScaler	volume

Appendix 3 - UMAP Parameters¶

Appendix 4 - Hypertuning Results¶

	max_binary_accuracy	max_binary_accuracy_epoch	max_precision	max_precision_epoch	max_recall	max_recall_epoch	max_f1	max_f1_epoch
run_name
baseline_24_hour_0028	0.680384	12	0.387734	12	0.863510	4	0.499422	12
baseline_24_hour_0022	0.729239	4	0.419468	4	0.751625	0	0.490235	0
baseline_24_hour_0002	0.656220	1	0.367180	1	0.766945	0	0.483677	1
baseline_24_hour_0012	0.739897	1	0.435823	1	0.592386	0	0.481419	0
baseline_24_hour_0013	0.620977	1	0.349266	1	0.888115	0	0.481225	1
baseline_24_hour_0017	0.634695	5	0.354291	6	0.832869	0	0.480335	6
baseline_24_hour_0027	0.750448	1	0.449738	1	0.805014	10	0.476830	2
baseline_24_hour_0016	0.726918	7	0.419985	7	0.708449	4	0.476242	8
baseline_24_hour_0011	0.601456	0	0.339275	0	0.888115	1	0.475635	0
baseline_24_hour_0025	0.593753	5	0.336167	5	0.887187	1	0.474761	5
baseline_24_hour_0024	0.727551	3	0.420328	3	0.608171	2	0.466529	3
baseline_24_hour_0003	0.578981	1	0.326399	1	0.880687	0	0.463854	1
baseline_24_hour_0019	0.541522	3	0.313341	3	0.887187	4	0.458432	3
baseline_24_hour_0029	0.727973	2	0.396787	2	0.735376	4	0.456958	1
baseline_24_hour_0007	0.655587	1	0.355091	1	0.631383	1	0.454545	1
baseline_24_hour_0014	0.576765	2	0.318535	0	0.837512	1	0.452256	0
baseline_24_hour_0006	0.514298	1	0.302468	1	0.882544	0	0.448941	1
baseline_24_hour_0026	0.505434	0	0.298745	0	0.908542	9	0.445128	0
baseline_24_hour_0018	0.546164	4	0.303816	0	0.779480	2	0.436116	0
baseline_24_hour_0020	0.591326	4	0.306811	4	0.894150	0	0.434435	3
baseline_24_hour_0015	0.523267	0	0.296768	0	0.801300	0	0.433124	0
baseline_24_hour_0021	0.479582	3	0.286308	3	0.897864	4	0.430090	3
baseline_24_hour_0023	0.671204	2	0.354595	2	0.891829	0	0.429513	2
baseline_24_hour_0004	0.523794	0	0.287439	0	0.740483	0	0.414124	0
baseline_24_hour_0001	0.561465	0	0.296793	0	0.954967	1	0.412994	0
baseline_24_hour_0005	0.728606	0	0.340214	0	0.591922	1	0.407739	1
baseline_24_hour_0000	0.532658	0	0.273812	0	0.808728	1	0.407390	1
baseline_24_hour_0010	0.700327	0	0.369482	0	0.450789	0	0.406106	0
baseline_24_hour_0009	0.457951	0	0.251768	1	0.776695	1	0.380271	1
baseline_24_hour_0008	0.408674	1	0.226010	0	0.849582	0	0.357038	0

Appendix 5 - Strategy Results¶

	return	long_only_hold	predictions	holding_begin_period	holding_end_period	strategy_return
unix
2018-02-27 08:00:00+00:00	0.002905	0.002905	0	0.0	0	0.000000
2018-02-27 09:00:00+00:00	-0.002501	0.000404	0	0.0	0	0.000000
2018-02-27 10:00:00+00:00	-0.014487	-0.014083	1	0.0	1	0.000000
2018-02-27 11:00:00+00:00	0.005351	-0.008732	0	1.0	0	0.005351
2018-02-27 12:00:00+00:00	-0.000102	-0.008834	0	0.0	0	0.005351
2018-02-27 13:00:00+00:00	0.007395	-0.001439	0	0.0	0	0.005351
2018-02-27 14:00:00+00:00	-0.010175	-0.011614	0	0.0	0	0.005351
2018-02-27 15:00:00+00:00	-0.011429	-0.023043	0	0.0	0	0.005351
2018-02-27 16:00:00+00:00	0.004129	-0.018913	0	0.0	0	0.005351
2018-02-27 17:00:00+00:00	-0.005636	-0.024550	0	0.0	0	0.005351
2018-02-27 18:00:00+00:00	0.004113	-0.020437	0	0.0	0	0.005351
2018-02-27 19:00:00+00:00	0.005157	-0.015280	0	0.0	0	0.005351
2018-02-27 20:00:00+00:00	0.000057	-0.015223	0	0.0	0	0.005351
2018-02-27 21:00:00+00:00	0.006820	-0.008403	0	0.0	0	0.005351
2018-02-27 22:00:00+00:00	-0.009023	-0.017426	1	0.0	1	0.005351
2018-02-27 23:00:00+00:00	-0.004112	-0.021538	1	1.0	1	0.001239
2018-02-28 00:00:00+00:00	0.006292	-0.015246	1	1.0	1	0.007532
2018-02-28 01:00:00+00:00	0.004858	-0.010388	1	1.0	1	0.012390
2018-02-28 02:00:00+00:00	-0.000897	-0.011285	1	1.0	1	0.011493
2018-02-28 03:00:00+00:00	0.007582	-0.003702	0	1.0	0	0.019075
2018-02-28 04:00:00+00:00	-0.004769	-0.008471	0	0.0	0	0.019075
2018-02-28 05:00:00+00:00	0.000668	-0.007803	0	0.0	0	0.019075
2018-02-28 06:00:00+00:00	-0.008709	-0.016512	0	0.0	0	0.019075
2018-02-28 07:00:00+00:00	-0.000845	-0.017358	0	0.0	0	0.019075
2018-02-28 08:00:00+00:00	-0.013752	-0.031110	0	0.0	0	0.019075
2018-02-28 09:00:00+00:00	0.003586	-0.027524	0	0.0	0	0.019075
2018-02-28 10:00:00+00:00	-0.002150	-0.029674	0	0.0	0	0.019075
2018-02-28 11:00:00+00:00	0.001052	-0.028622	0	0.0	0	0.019075
2018-02-28 12:00:00+00:00	-0.007367	-0.035988	1	0.0	1	0.019075
2018-02-28 13:00:00+00:00	0.007182	-0.028807	0	1.0	0	0.026257
2018-02-28 14:00:00+00:00	0.005074	-0.023733	0	0.0	0	0.026257
2018-02-28 15:00:00+00:00	-0.001877	-0.025609	0	0.0	0	0.026257
2018-02-28 16:00:00+00:00	-0.004724	-0.030334	1	0.0	1	0.026257
2018-02-28 17:00:00+00:00	0.002844	-0.027489	1	1.0	1	0.029101
2018-02-28 18:00:00+00:00	-0.000323	-0.027813	1	1.0	1	0.028778
2018-02-28 19:00:00+00:00	0.002457	-0.025356	0	1.0	0	0.031235
2018-02-28 20:00:00+00:00	0.001669	-0.023687	0	0.0	0	0.031235
2018-02-28 21:00:00+00:00	-0.003191	-0.026878	0	0.0	0	0.031235
2018-02-28 22:00:00+00:00	-0.008936	-0.035814	0	0.0	0	0.031235
2018-02-28 23:00:00+00:00	-0.010710	-0.046523	0	0.0	0	0.031235
2018-03-01 00:00:00+00:00	0.005422	-0.041101	0	0.0	0	0.031235
2018-03-01 01:00:00+00:00	0.001883	-0.039219	0	0.0	0	0.031235
2018-03-01 02:00:00+00:00	-0.003429	-0.042648	0	0.0	0	0.031235
2018-03-01 03:00:00+00:00	0.002891	-0.039756	0	0.0	0	0.031235
2018-03-01 04:00:00+00:00	0.002894	-0.036862	0	0.0	0	0.031235
2018-03-01 05:00:00+00:00	-0.000245	-0.037107	1	0.0	1	0.031235
2018-03-01 06:00:00+00:00	0.000909	-0.036198	1	1.0	1	0.032144
2018-03-01 07:00:00+00:00	0.005077	-0.031121	1	1.0	1	0.037220
2018-03-01 08:00:00+00:00	0.005512	-0.025609	1	1.0	1	0.042732
2018-03-01 09:00:00+00:00	-0.003590	-0.029200	1	1.0	1	0.039142
2018-03-01 10:00:00+00:00	0.006731	-0.022468	0	1.0	0	0.045873
2018-03-01 11:00:00+00:00	-0.001932	-0.024400	0	0.0	0	0.045873
2018-03-01 12:00:00+00:00	-0.006420	-0.030820	0	0.0	0	0.045873
2018-03-01 13:00:00+00:00	0.003965	-0.026855	0	0.0	0	0.045873
2018-03-01 14:00:00+00:00	-0.001397	-0.028252	0	0.0	0	0.045873
2018-03-01 15:00:00+00:00	-0.004006	-0.032258	0	0.0	0	0.045873
2018-03-01 16:00:00+00:00	0.004606	-0.027651	0	0.0	0	0.045873
2018-03-01 17:00:00+00:00	0.008039	-0.019612	0	0.0	0	0.045873
2018-03-01 18:00:00+00:00	0.004560	-0.015052	0	0.0	0	0.045873
2018-03-01 19:00:00+00:00	-0.004961	-0.020013	0	0.0	0	0.045873
2018-03-01 20:00:00+00:00	0.004642	-0.015371	0	0.0	0	0.045873
2018-03-01 21:00:00+00:00	-0.006523	-0.021894	0	0.0	0	0.045873
2018-03-01 22:00:00+00:00	-0.001298	-0.023192	0	0.0	0	0.045873
2018-03-01 23:00:00+00:00	-0.001288	-0.024481	0	0.0	0	0.045873
2018-03-02 00:00:00+00:00	0.000288	-0.024193	0	0.0	0	0.045873
2018-03-02 01:00:00+00:00	0.004054	-0.020139	0	0.0	0	0.045873
2018-03-02 02:00:00+00:00	0.000023	-0.020116	0	0.0	0	0.045873
2018-03-02 03:00:00+00:00	-0.002008	-0.022124	0	0.0	0	0.045873
2018-03-02 04:00:00+00:00	-0.002069	-0.024193	0	0.0	0	0.045873
2018-03-02 05:00:00+00:00	0.001104	-0.023089	0	0.0	0	0.045873
2018-03-02 06:00:00+00:00	0.002124	-0.020965	0	0.0	0	0.045873
2018-03-02 07:00:00+00:00	-0.003240	-0.024204	0	0.0	0	0.045873
2018-03-02 08:00:00+00:00	0.004180	-0.020024	0	0.0	0	0.045873
2018-03-02 09:00:00+00:00	-0.011595	-0.031620	0	0.0	0	0.045873
2018-03-02 10:00:00+00:00	-0.002612	-0.034232	0	0.0	0	0.045873
2018-03-02 11:00:00+00:00	-0.000581	-0.034813	0	0.0	0	0.045873
2018-03-02 12:00:00+00:00	0.003622	-0.031191	0	0.0	0	0.045873
2018-03-02 13:00:00+00:00	-0.000429	-0.031620	0	0.0	0	0.045873
2018-03-02 14:00:00+00:00	-0.011532	-0.043152	0	0.0	0	0.045873
2018-03-02 15:00:00+00:00	0.000867	-0.042284	0	0.0	0	0.045873
2018-03-02 16:00:00+00:00	0.000679	-0.041605	0	0.0	0	0.045873
2018-03-02 17:00:00+00:00	-0.000375	-0.041980	0	0.0	0	0.045873
2018-03-02 18:00:00+00:00	-0.001313	-0.043292	0	0.0	0	0.045873
2018-03-02 19:00:00+00:00	-0.004679	-0.047972	1	0.0	1	0.045873
2018-03-02 20:00:00+00:00	0.007713	-0.040259	0	1.0	0	0.053586
2018-03-02 21:00:00+00:00	0.004620	-0.035639	0	0.0	0	0.053586
2018-03-02 22:00:00+00:00	-0.000699	-0.036338	0	0.0	0	0.053586
2018-03-02 23:00:00+00:00	-0.003746	-0.040084	1	0.0	1	0.053586
2018-03-03 00:00:00+00:00	0.003105	-0.036979	1	1.0	1	0.056691
2018-03-03 01:00:00+00:00	0.007524	-0.029454	0	1.0	0	0.064216
2018-03-03 02:00:00+00:00	-0.001076	-0.030531	0	0.0	0	0.064216
2018-03-03 03:00:00+00:00	-0.004085	-0.034615	0	0.0	0	0.064216
2018-03-03 04:00:00+00:00	0.003135	-0.031481	0	0.0	0	0.064216
2018-03-03 05:00:00+00:00	-0.002065	-0.033546	0	0.0	0	0.064216
2018-03-03 06:00:00+00:00	0.001010	-0.032536	0	0.0	0	0.064216
2018-03-03 07:00:00+00:00	-0.004349	-0.036885	0	0.0	0	0.064216
2018-03-03 08:00:00+00:00	0.003792	-0.033093	0	0.0	0	0.064216
2018-03-03 09:00:00+00:00	-0.001580	-0.034673	0	0.0	0	0.064216
2018-03-03 10:00:00+00:00	0.000953	-0.033720	0	0.0	0	0.064216
2018-03-03 11:00:00+00:00	0.001045	-0.032675	0	0.0	0	0.064216