{"id":14509,"date":"2023-04-11T05:01:54","date_gmt":"2023-04-10T21:01:54","guid":{"rendered":"https:\/\/www.tejwin.com\/?post_type=insight&#038;p=14509"},"modified":"2026-02-25T11:56:10","modified_gmt":"2026-02-25T03:56:10","slug":"gru-and-lstm","status":"publish","type":"insight","link":"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/","title":{"rendered":"GRU and LSTM"},"content":{"rendered":"\n<figure class=\"wp-block-image\" id=\"1d47\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1_09cK4_87IzMYiwT0M.jpg\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Photo by <a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/unsplash.com\/@markuswinkler?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"photo-creator noopener\" data-href=\"https:\/\/unsplash.com\/@markuswinkler?utm_source=medium&amp;utm_medium=referral\" data->Markus Winkler<\/a> on&nbsp;<a class=\"markup--anchor markup--figure-anchor\" href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"photo-source noopener\" data-href=\"https:\/\/unsplash.com?utm_source=medium&amp;utm_medium=referral\" data->Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p id=\"5d66\">&nbsp;<\/p>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_81 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a117f131bc9d\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"ez-toc-cssicon\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a117f131bc9d\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Highlights\" >Highlights:<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Preface\" >Preface<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Programming_environment_and_Module_required\" >Programming environment and Module required<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Database\" >Database<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Import_data\" >Import data<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Single_layer_LSTM\" >Single layer LSTM<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Draw_loss_curve\" >Draw loss curve<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Double_layer_LSTM\" >Double layer LSTM<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Draw_loss_curve-2\" >Draw loss curve<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Single_layer_GRU\" >Single layer GRU<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Double_layer_GRU\" >Double layer GRU<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Draw_loss_curve-3\" >Draw loss curve<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Conclusion\" >Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Source_Code\" >Source Code<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Extended_Reading\" >Extended Reading<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/tejwin20260323.j.webweb.today\/en\/insight\/gru-and-lstm\/#Related_Link\" >Related Link<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\" id=\"8bc4\"><span class=\"ez-toc-section\" id=\"Highlights\"><\/span>Highlights:<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Difficulty\uff1a\u2605\u2605\u2605\u2605\u2605<\/li>\n\n\n\n<li>Utilizing historical stock price data to predict future close price.<\/li>\n\n\n\n<li>Advice: Two RNN-based models are used for time series prediction in this article. Therefore, fundamental knowledge of time series prediction and deep learning are required, you can check this medium,<a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/data-analysis-lstm-trading-signal-judgment-edf67584a564\" target=\"_blank\" rel=\"noopener\">\u3010Data Analysis\u3011LSTM Trading Signal Judgment<\/a>, for LSTM knowledge learning.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"12e8\"><span class=\"ez-toc-section\" id=\"Preface\"><\/span>Preface<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"18be\">Profit-chasing and risk-averse are the innate naturals of all investors. One way to achieve these goals is to predict the future stock movement. In the past, time series models such as ARIMA and GARCH are widely used to characterize the trajectory of future stock prices. Nowadays, As the boom of artificial intelligence, more and more time-series-related deep learning models have emerged and seem to be new solutions for stock price prediction. In this article, we apply GRU and LSTM model for stock price prediction, using open price, high price, low price and close price in the past five days to predict next day\u2019s close price.<\/p>\n\n\n\n<p id=\"b9b2\">There are many articles describing LSTM model, so no more introduction for LSTM in today\u2019s article. GRU model will be our focal point today. Similar to LSTM, GRU is also a RNN-based model.<strong>&nbsp;However, unlike LSTM which has three different gates, forget gate, input gate and output gate, GRU only contains update gate and reset gate. The former gate is identical to forget and input gate of LSTM, it decides which hidden information would be reserved or abandoned during each iteration. The latter gate decides which information accumulated from past iteration would be abandoned.<\/strong>&nbsp;Since the reduction of the numbers of gates, GRU theoretically would achieve more rapid computation speed with little or none depletion of performance.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"0fac\"><span class=\"ez-toc-section\" id=\"Programming_environment_and_Module_required\"><\/span>Programming environment and Module required<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"dcc5\">Google Colab is used as editor<\/p>\n\n\n\n<pre id=\"586e\" class=\"wp-block-preformatted\"># Load require module\nimport pandas as pd \nimport numpy as np\nfrom sklearn.preprocessing import StandardScaler\nimport plotly.graph_objects as go\nimport os\nimport time\nimport tejapi\nimport math\nimport torch\nfrom torch import nn, optim\nfrom torch.utils.data import Dataset, DataLoader, TensorDataset\n\n# log in TEJ API\napi_key = 'YOUR_KEY'\ntejapi.ApiConfig.api_key = api_key\ntejapi.ApiConfig.ignoretz = True\n\n# gpu setting\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"0da5\"><span class=\"ez-toc-section\" id=\"Database\"><\/span>Database<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"f8ad\"><a href=\"https:\/\/api.tej.com.tw\/columndoc.html?subId=42\" rel=\"noreferrer noopener\" target=\"_blank\">Stock trading database<\/a>: Unadjusted daily stock price, database code is (TWN\/APRCD).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"fa0c\"><span class=\"ez-toc-section\" id=\"Import_data\"><\/span>Import data<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"e633\">In here, we take unadjusted open, high, low, close price from TSMC(2330.TW) as input features. The sampling period is from 2019\u201301\u201301 to 2023\u201301\u201301. First, the standardization for all four features is processed. Then, a training set and a validation set are separated from the standardized dataset by the ratio of 8:2. Standardization could solve feature scaling problem and boost up the speed of training process.<\/p>\n\n\n\n<pre id=\"9473\" class=\"wp-block-preformatted\"># import data from tej database\ngte, lte = '2019-01-01', '2023-01-01'\ndata = tejapi.get('TWN\/APRCD',\n                   paginate = True,\n                   coid = '2330', \n                   mdate = {'gte':gte, 'lte':lte},\n                   opts = {\n                       'columns':[ 'mdate', 'open_d', 'high_d', 'low_d', 'close_d', 'volume']\n                   }\n                  )\n# standardization\nscaler = StandardScaler()\ndata = scaler.fit_transform(data)\n\n# train validation split\ntrain, test = data[:int(0.8 * len(data)), :4], data[int(0.8 * len(data)):, :4]<\/pre>\n\n\n\n<p id=\"8186\">Next, we create the Pytorch Dataset and DataLoader, these two functions automatically create batch data and allow us input the data into model conveniently.<\/p>\n\n\n\n<pre id=\"72b9\" class=\"wp-block-preformatted\">def create_dataset(dataset, lookback):\n    X, y = [], []\n    for i in range(len(dataset)-lookback):\n        feature = dataset[i:i+lookback, :]\n        target = dataset[i+1:i+lookback+1][-1][-1]\n        X.append(feature)\n        y.append(target)\n    return torch.FloatTensor(X).to(device), torch.FloatTensor(y).view(-1, 1).to(device)\n\nlookback = 5 # set the window to 5 days\nX_train, y_train = create_dataset(train, lookback = lookback)\nX_val, y_val = create_dataset(test, lookback = lookback)\nloader = DataLoader(TensorDataset(X_train, y_train), shuffle = False, batch_size = 32)<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"5c47\"><span class=\"ez-toc-section\" id=\"Single_layer_LSTM\"><\/span>Single layer LSTM<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"4d0d\">The structure of single layer LSTM contains one LSTM layer, then a Dropout layer, eventually a fully-connected layer is concatenated. The dropout layer is for over-fitting prevention.<\/p>\n\n\n\n<p id=\"4ad4\">\u25cf input_size: The feature size of input data. We use open, close\uff0chigh and low price, so input_size = 4.<br>\u25cf hidden_size: The number of neuron in LSTM hidden layer\u3002<br>\u25cf num_layer: The number of layer of LSTM, default value is one\u3002<br>\u25cf batch_first: Set dimension of output as (batch_size, sequence_length, hidden_size). The sequence_length = 5, because we set window as 5 days.<\/p>\n\n\n\n<pre id=\"dfec\" class=\"wp-block-preformatted\"># Create LSTM fuction\nclass S_LSTM(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.lstm1 = nn.LSTM(input_size = 4, hidden_size=64, num_layers=1, batch_first=True)\n        self.dropout = nn.Dropout(0.2)\n        self.linear = nn.Linear(64, 1)\n    def forward(self, x):\n        x, _ = self.lstm1(x)\n        x = self.dropout(x)\n        x = x[:, -1, :]\n        x = self.linear(x)\n        return x\n\n# Create training process function\ndef trainer(epochs, loader, X_train, y_train, X_val, y_val, model, criterion, optimizer):\n  train_loss, test_loss = [],[]\n  for epoch in range(epochs):\n    model.train()\n    for batch, (x, y_true) in enumerate(loader):\n      y_pred = model(x)\n      loss = criterion(y_pred, y_true)\n      loss.backward()\n      optimizer.step()\n      optimizer.zero_grad()\n    model.eval()\n    with torch.no_grad():\n      y_pred = model(X_train)\n      train_rmse = np.sqrt(criterion(y_pred, y_train).item())\n      train_loss.append(train_rmse)\n      y_pred = model(X_val)\n      test_rmse = np.sqrt(criterion(y_pred, y_val).item())\n      test_loss.append(test_rmse)\n      if (epoch+1) % 100 == 0:\n        print('epoch %d train rmse %.4f test rmse %.4f' % (epoch+1, train_rmse, test_rmse))\n  return train_loss, test_loss\n\n# Set model, loss function and optimizer\nmodel = S_LSTM().to(device)\ncriterion = nn.MSELoss()\noptimizer = optim.Adam(model.parameters())\nepochs = 1000\n\n# Train start and compute time cost\nstart = time.time()\nslstm_train_loss, slstm_test_loss = trainer(epochs, loader, X_train, y_train, X_val, y_val, model, criterion, optimizer)\nend = time.time()\nprint('single lstm time cost %.4f' %(end-start))<\/pre>\n\n\n\n<figure id=\"e5b0\" class=\"graf graf--figure graf-after--pre\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1AFmx2l1nM40wz0NYi3PX1w-2.png\" alt=\"\u8a13\u7df4\u7d50\u679c\"\/><figcaption class=\"wp-element-caption\">\u8a13\u7df4\u7d50\u679c<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"79b0\"><span class=\"ez-toc-section\" id=\"Draw_loss_curve\"><\/span>Draw loss curve<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<pre id=\"9b53\" class=\"wp-block-preformatted\"><span class=\"pre--content\">fig = go.Figure()\nfig.add_trace(go.Scatter(x=np.arange(epochs), y=slstm_train_loss,\n                    mode=<span class=\"hljs-string\">'lines'<\/span>,\n                    name=<span class=\"hljs-string\">'Train Loss'<\/span>))\nfig.add_trace(go.Scatter(x=np.arange(epochs) , y=slstm_test_loss,\n                    mode=<span class=\"hljs-string\">'lines'<\/span>,\n                    name=<span class=\"hljs-string\">'Validation Loss'<\/span>))\nfig.update_layout(\n    title=<span class=\"hljs-string\">\"Loss curve for single lstm\"<\/span>,\n    xaxis_title=<span class=\"hljs-string\">\"epochs\"<\/span>,\n    yaxis_title=<span class=\"hljs-string\">\"rmse\"<\/span>\n)\nfig.show()<\/span><\/pre>\n\n\n\n<figure id=\"f3c4\" class=\"graf graf--figure graf-after--pre\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/19SplDOh1_H93pHnIXWS5YA-2.png\" alt=\"\u55ae\u5c64LSTM\u7684\u640d\u5931\u66f2\u7dda\"\/><figcaption class=\"wp-element-caption\">loss curve for single layer LSTM<\/figcaption><\/figure>\n\n\n\n<p id=\"f4ef\">From the loss curve above, we can discover that validation loss converges to 0.07 at the 200th epoch. Furthermore, we can draw a stock price line plot to verify the predictability of single layer LSTM.<\/p>\n\n\n\n<pre id=\"d9bf\" class=\"wp-block-preformatted\">train_plot = np.ones_like(data[:, 3]) * np.nan\ntest_plot = np.ones_like(data[:, 3]) * np.nan\nwith torch.no_grad():\n  # predict train data\n  y_pred = model(X_train)\n  train_plot[lookback:int(0.8 * len(data))] = y_pred.view(-1).cpu()\n  # predict validation data\n  y_pred = model(X_val)\n  test_plot[int(0.8 * len(data))+lookback:] = y_pred.view(-1).cpu()\n\nfig = go.Figure()\nfig.add_trace(go.Scatter(x=mdate, y=train_plot,\n                    mode='lines',\n                    name='Train'))\nfig.add_trace(go.Scatter(x=mdate , y=test_plot,\n                    mode='lines',\n                    name='Validation'))\nfig.add_trace(go.Scatter(x=mdate , y=data[:, 3],\n                    mode='lines',\n                    name='True'))\nfig.update_layout(\n    title=\"Stock prediction for sngle lstm\",\n    xaxis_title=\"dates\",\n    yaxis_title=\"standardised stock\"\n)\nfig.show()\n<\/pre>\n\n\n\n<figure id=\"f1d5\" class=\"graf graf--figure graf-after--pre\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1fjFDJNm3IYZNRj-ooIZHwg-2.png\" alt=\"\u55ae\u5c64LSTM\u80a1\u50f9\u9810\u6e2c\"\/><figcaption class=\"wp-element-caption\">Price prediction for single layer LSTM<\/figcaption><\/figure>\n\n\n\n<p id=\"b86d\">From the stock prediction and loss curve plots, it can be said that the predictability of single layer LSTM is quite nice. The result is quite intriguing, since it conflicts against out previous result from&nbsp;<a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/data-analysis-lstm-trading-signal-judgment-edf67584a564\" target=\"_blank\" rel=\"noopener\">\u3010Data Analysis\u3011LSTM Trading Signal Judgment<\/a>. In their result, the single layer LSTM is not able to fully capture the time series information and perform prediction awfully. The main differences between the previous and this model are the previous one additionally use daily trading volume as input feature, the dimension of output from LSTM layer(the previous is 32, the new is 64) and the dropout ratio(the previous is 0.3, the new is 0.2). Currently, we believe that the most likely reason is using daily trading volume as an input feature.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"3c4c\"><span class=\"ez-toc-section\" id=\"Double_layer_LSTM\"><\/span>Double layer LSTM<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"f5df\">Although single layer can achieve quite excellence performance, we still try out new model by stacking up more LSTM layers, in order to reach better benchmark score. The structure of stacked LSTM: one LSTM layer \u2192 one Dropout layer \u2192 one LSTM layer \u2192 one Dropout layer \u2192 one fully connected layer. The ratio of dropout in the two dropout layers are set to 0.4.<\/p>\n\n\n\n<pre id=\"f5c5\" class=\"wp-block-preformatted\"># Create double layer LSTM function\nclass LSTM(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.lstm1 = nn.LSTM(input_size = 4, hidden_size=64, num_layers=1, batch_first=True)\n        self.dropout1 = nn.Dropout(0.4)\n        self.lstm2 = nn.LSTM(input_size = 64, hidden_size=32, num_layers=1, batch_first=True)\n        self.dropout2 = nn.Dropout(0.4)\n        self.linear = nn.Linear(32, 1)\n    def forward(self, x):\n        x, _ = self.lstm1(x)\n        x = self.dropout1(x)\n        x, _ = self.lstm2(x)\n        x = self.dropout2(x)\n        x = x[:, -1, :]\n        x = self.linear(x)\n        return x\n# Create training process function\ndef trainer(epochs, loader, X_train, y_train, X_val, y_val, model, criterion, optimizer):\n  train_loss, test_loss = [],[]\n  for epoch in range(epochs):\n    model.train()\n    for batch, (x, y_true) in enumerate(loader):\n      y_pred = model(x)\n      loss = criterion(y_pred, y_true)\n      loss.backward()\n      optimizer.step()\n      optimizer.zero_grad()\n    model.eval()\n    with torch.no_grad():\n      y_pred = model(X_train)\n      train_rmse = np.sqrt(criterion(y_pred, y_train).item())\n      train_loss.append(train_rmse)\n      y_pred = model(X_val)\n      test_rmse = np.sqrt(criterion(y_pred, y_val).item())\n      test_loss.append(test_rmse)\n      if (epoch+1) % 100 == 0:\n        print('epoch %d train rmse %.4f test rmse %.4f' % (epoch+1, train_rmse, test_rmse))\n  return train_loss, test_loss\n# Set model, optimizer, loss function\nmodel = LSTM().to(device)\ncriterion = nn.MSELoss()\noptimizer = optim.Adam(model.parameters())\nepochs = 1000\n# Train start and compute time cost\nstart = time.time()\nlstm_train_loss, lstm_test_loss = trainer(epochs, loader, X_train, y_train, X_val, y_val, model, criterion, optimizer)\nend = time.time()\nprint('stack lstm time cost %.4f' %(end-start))<\/pre>\n\n\n\n<figure id=\"b431\" class=\"graf graf--figure graf-after--pre\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1bU8RXcbDqLK4vewuxq2VUQ-2.png\" alt=\"\u8a13\u7df4\u7d50\u679c\"\/><figcaption class=\"wp-element-caption\">training result<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4fef\"><span class=\"ez-toc-section\" id=\"Draw_loss_curve-2\"><\/span>Draw loss curve<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<figure id=\"f424\" class=\"graf graf--figure graf-after--p\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1m8s502EhQYoH9Jy1G25qJw-2.png\" alt=\"\u96d9\u5c64LSTM\u640d\u5931\u66f2\u7dda\"\/><figcaption class=\"wp-element-caption\">Stacked LSTM loss curve<\/figcaption><\/figure>\n\n\n\n<p id=\"6d55\">As the complexity of model increases, the convergence rate decreases. It is not until the 500th epochs for model to reach convergence at 0.1. Moreover, stacked LSTM also has more volatile loss curve than single LSTM does. In the picture down below, we can find out that the predictability of stacked LSTM is actually worse than single LSTM. However, despite of lower predictability, stacked layer still is able to capture the trend of stock price. Python code for loss curve and prediction plots are shown in the end.<\/p>\n\n\n\n<figure id=\"9634\" class=\"graf graf--figure graf-after--p\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/14uYK4GtyuRNvWH0jwhCfnA-2.png\" alt=\"\u96d9\u5c64LSTM\u80a1\u50f9\u9810\u6e2c\"\/><figcaption class=\"wp-element-caption\">Stacked LSTM stock prediction<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"dbce\"><span class=\"ez-toc-section\" id=\"Single_layer_GRU\"><\/span>Single layer GRU<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"3f22\">Next, we use single layer GRU for prediction. The structure is similar to single layer LSTM, we just replace LSTM layer with GRU layer.<\/p>\n\n\n\n<pre id=\"3127\" class=\"wp-block-preformatted\"># create single layer GRU function\nclass S_GRU(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.gru1 = nn.GRU(input_size = 4, hidden_size=64, num_layers=1, batch_first = True)\n        self.dropout = nn.Dropout(0.2)\n        self.linear = nn.Linear(64, 1)\n    def forward(self, x):\n        x, _ = self.gru1(x)\n        x = self.dropout(x)\n        x = x[:, -1, :]\n        x = self.linear(x)\n        return x\n# Create training process function\ndef trainer(epochs, loader, X_train, y_train, X_val, y_val, model, criterion, optimizer):\n  train_loss, test_loss = [],[]\n  for epoch in range(epochs):\n    model.train()\n    for batch, (x, y_true) in enumerate(loader):\n      y_pred = model(x)\n      loss = criterion(y_pred, y_true)\n      loss.backward()\n      optimizer.step()\n      optimizer.zero_grad()\n    model.eval()\n    with torch.no_grad():\n      y_pred = model(X_train)\n      train_rmse = np.sqrt(criterion(y_pred, y_train).item())\n      train_loss.append(train_rmse)\n      y_pred = model(X_val)\n      test_rmse = np.sqrt(criterion(y_pred, y_val).item())\n      test_loss.append(test_rmse)\n      if (epoch+1) % 100 == 0:\n        print('epoch %d train rmse %.4f test rmse %.4f' % (epoch+1, train_rmse, test_rmse))\n  return train_loss, test_loss\n# set model, optimizer and loss function\nmodel = S_GRU().to(device)\ncriterion = nn.MSELoss()\noptimizer = optim.Adam(model.parameters())\nepochs = 1000\n# Train start and compute time cost\nstart = time.time()\nsgru_train_loss, sgru_test_loss = trainer(epochs, loader, X_train, y_train, X_val, y_val, model, criterion, optimizer)\nend = time.time()\nprint('single gru time cost %.4f' %(end-start))<\/pre>\n\n\n\n<figure id=\"0a68\" class=\"graf graf--figure graf-after--pre\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1RZfz22LPHpwj0JcwgtVS3g-2.png\" alt=\"\u00a0\u8a13\u7df4\u7d50\u679c\"\/><figcaption class=\"wp-element-caption\">train result<\/figcaption><\/figure>\n\n\n\n<figcaption class=\"imageCaption\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;<\/figcaption>\n\n\n\n<p>Draw loss curve<\/p>\n\n\n\n<figure id=\"18b1\" class=\"graf graf--figure graf-after--p\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1_bh0T5tVNUk1GqpTzwoqEw-2.png\" alt=\"\u55ae\u5c64GRU\u640d\u5931\u66f2\u7dda\"\/><figcaption class=\"wp-element-caption\">single layer GRU loss curve<\/figcaption><\/figure>\n\n\n\n<p id=\"85eb\">Same as the loss curve of single LSTM, it also converges to 0.7 at the 200th epoch. The training loss curve volatiles a bit more than curve from single LSTM. Also, the similar result as the predictability of the single LSTM, the single GRU model predicts price really well.<\/p>\n\n\n\n<figure id=\"0912\" class=\"graf graf--figure graf-after--p\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/12fa9sNebAOk8_N4dyViGaQ-2.png\" alt=\"\u55ae\u5c64GRU\u80a1\u50f9\u9810\u6e2c\"\/><figcaption class=\"wp-element-caption\">Single layer GRU price prediction<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"68bb\"><span class=\"ez-toc-section\" id=\"Double_layer_GRU\"><\/span>Double layer GRU<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"a95c\">We also create a stacked GRU model to verify whether a more complex GRU model can achieve better performance. The stacked structure: one GRU layer \u2192 one Dropout layer \u2192 one GRU layer \u2192 one Dropout layer \u2192 one fully connected layer. The dropout ratio of two layers are set at 0.4.<\/p>\n\n\n\n<pre id=\"705a\" class=\"wp-block-preformatted\"># create double layer gru model function\nclass GRU(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.gru1 = nn.GRU(input_size = 4, hidden_size=64, num_layers=1, batch_first=True)\n        self.dropout1 = nn.Dropout(0.4)\n        self.gru2 = nn.GRU(input_size = 64, hidden_size=32, num_layers=1, batch_first=True)\n        self.dropout2 = nn.Dropout(0.4)\n        self.linear = nn.Linear(32, 1)\n    def forward(self, x):\n        x, _ = self.gru1(x)\n        x = self.dropout1(x)\n        x, _ = self.gru2(x)\n        x = self.dropout2(x)\n        x = x[:, -1, :]\n        x = self.linear(x)\n        return x\n# create train process function\ndef trainer(epochs, loader, X_train, y_train, X_val, y_val, model, criterion, optimizer):\n  train_loss, test_loss = [],[]\n  for epoch in range(epochs):\n    model.train()\n    for batch, (x, y_true) in enumerate(loader):\n      y_pred = model(x)\n      loss = criterion(y_pred, y_true)\n      loss.backward()\n      optimizer.step()\n      optimizer.zero_grad()\n    model.eval()\n    with torch.no_grad():\n      y_pred = model(X_train)\n      train_rmse = np.sqrt(criterion(y_pred, y_train).item())\n      train_loss.append(train_rmse)\n      y_pred = model(X_val)\n      test_rmse = np.sqrt(criterion(y_pred, y_val).item())\n      test_loss.append(test_rmse)\n      if (epoch+1) % 100 == 0:\n        print('epoch %d train rmse %.4f test rmse %.4f' % (epoch+1, train_rmse, test_rmse))\n  return train_loss, test_loss\n# set model, optimizer and loss function\nmodel = GRU().to(device)\ncriterion = nn.MSELoss()\noptimizer = optim.Adam(model.parameters())\nepochs = 1000\n# Train start and compute time cost\nstart = time.time()\ngru_train_loss, gru_test_loss = trainer(epochs, loader, X_train, y_train, X_val, y_val, model, criterion, optimizer)\nend = time.time()\nprint('stack gru time cost %.4f' %(end-start))<\/pre>\n\n\n\n<figure id=\"5715\" class=\"graf graf--figure graf-after--pre\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized caption-align-center graf-image\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/129f7eWDgp1icMqtPahg-ow-2.png\" alt=\"\u8a13\u7df4\u7d50\u679c\" width=\"357\" height=\"186\"\/><figcaption class=\"wp-element-caption\">train result<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"a9a3\"><span class=\"ez-toc-section\" id=\"Draw_loss_curve-3\"><\/span>Draw loss curve<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<figure id=\"1144\" class=\"graf graf--figure graf-after--p\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1iWnYnTlcvKAPScFXEzRMKw-2.png\" alt=\"\u96d9\u5c64GRU\u640d\u5931\u66f2\u7dda\"\/><figcaption class=\"wp-element-caption\">Double layer GRU loss curve<\/figcaption><\/figure>\n\n\n\n<p id=\"3301\">The volatility of loss curve of stacked GRU is higher than that of single GRU. It gradually converges to 0.7 at the 300th epoch. From the below picture, the predictability of stacked GRU is apparently lower than that of single GRU.<\/p>\n\n\n\n<figure id=\"3e8d\" class=\"graf graf--figure graf-after--p\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1eSazxslxl61pPb1SCS40SQ-2.png\" alt=\"\u96d9\u5c64GRU\u80a1\u50f9\u9810\u6e2c\"\/><figcaption class=\"wp-element-caption\">stacked GRU price prediction<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"ce6a\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"5b5e\">Overall, we can discover that both single layer LSTM and GRU perform finely at predicting TSMC stock price, while stacked models perform a bit worse. Furthermore, we compare both single layer models` loss curve in the next picture. Both curve reach to convergence at around 0.07. The volatility for both curves are actually identical. While loss drops more rapidly for GRU at the beginning of training session. Python code for the following graph is shown in the end.<\/p>\n\n\n\n<figure id=\"4627\" class=\"graf graf--figure graf-after--p\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1m6FxElmrdRzP26rnr_593g-2.png\" alt=\"LSTM\u3001GRU\u7d50\u679c\u5716\"\/><figcaption class=\"wp-element-caption\">LSTM\u3001GRU\u7d50\u679c\u5716<\/figcaption><\/figure>\n\n\n\n<p id=\"0fbc\">Besides, in theory, GRU should outperform LSTM at computational speed. During the training session, this stylized fact is also proven true. From the highlight area down below, the single GRU is 8 seconds faster than single LSTM, and double GRU is 3 seconds faster than double LSTM.<\/p>\n\n\n\n<figure id=\"6aeb\" class=\"graf graf--figure graf-after--p\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter caption-align-center graf-image\"><img decoding=\"async\" src=\"https:\/\/tejwin20260323.j.webweb.today\/wp-content\/uploads\/1fH40K7sEjx1-SJmdXAI4oQ-2.png\" alt=\"\u904b\u884c\u6642\u9593\u6bd4\u8f03\"\/><figcaption class=\"wp-element-caption\">\u904b\u884c\u6642\u9593\u6bd4\u8f03<\/figcaption><\/figure>\n\n\n\n<p id=\"ebac\">Genernally, Both LSTM and GRU predict well in this case. Benefit from more simple structure, GRU has computation speed advantage. Since we only take one stock and limit the time period from 2019 to 2022, statistically, we can not confirm that LSTM or GRU is the perfect model for stock prediction. However, based on the conclusion of<a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/data-analysis-lstm-trading-signal-judgment-edf67584a564\" target=\"_blank\" rel=\"noopener\">\u3010Data Analysis\u3011LSTM Trading Signal Judgment<\/a>&nbsp;and this experiment, we believe GRU and LSTM could play a role as an auxiliary tool for stock selection strategy. By combining other technical analysis indexes, such as:&nbsp;<a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/application-bollinger-bands-trading-strategy-23f023f686a9\" target=\"_blank\" rel=\"noopener\">\u3010Application\u3011Bollinger Bands Trading Strategy<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/quant-8-backtesting-by-macd-indicator-f0dc6ceecef2\" target=\"_blank\" rel=\"noopener\">\u3010Quant(8)\u3011Backtesting by MACD Indicator<\/a>&nbsp;, we can bulid a solid trading strategy.<\/p>\n\n\n\n<p id=\"2adc\">Last but not least, please note that \u201c<strong>Stocks this article mentions are just for the discussion, please do not consider it to be any recommendations or suggestions for investment or products.\u201d<\/strong>&nbsp;Hence, if you are interested in issues like Creating Trading Strategy , Performance Backtesting , Evidence-based research , welcome to purchase the plans offered in&nbsp;<a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/index\" rel=\"noreferrer noopener\" target=\"_blank\">TEJ E Shop<\/a>&nbsp;and use the well-complete database to find the potential event.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"c87b\"><span class=\"ez-toc-section\" id=\"Source_Code\"><\/span>Source Code<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/gist.github.com\/tej87681088\/db618cc631c8ae54523e074a1da10f27#file-tejapi_python_gru-ipynb\" class=\"ek-link\" target=\"_blank\" rel=\"noopener\">Github<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"9f9e\"><span class=\"ez-toc-section\" id=\"Extended_Reading\"><\/span>Extended Reading<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/application-bollinger-bands-trading-strategy-23f023f686a9\" target=\"_blank\" rel=\"noopener\">\u3010Application\u3011Bollinger Bands Trading Strategy<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/medium.com\/tej-api-financial-data-anlaysis\/quant-seeking-alpha-4d9ae499f9e0\" target=\"_blank\" rel=\"noopener\">\u3010Quant\u3011Seeking Alpha<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"b12b\"><span class=\"ez-toc-section\" id=\"Related_Link\"><\/span>Related Link<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/api.tej.com.tw\/index.html\" rel=\"noreferrer noopener\" target=\"_blank\">TEJ API<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/eshop.tej.com.tw\/E-Shop\/Edata_intro\" target=\"_blank\" rel=\"noreferrer noopener\">TEJ E-Shop<\/a><\/li>\n<\/ul>\n\n\n\n<p><em>You could give us encouragement by \u2026<br>We will share financial database applications every week.<br>If you think today\u2019s article is good, you can click on the&nbsp;<\/em><strong><em>applause icon&nbsp;<\/em><\/strong><em>once.<br>If you think it is awesome, you can hold the&nbsp;<\/em><strong><em>applause icon<\/em><\/strong><em>&nbsp;until 50 times.<br>Any feedback is welcome, please feel free to leave a comment below.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Highlights: Preface Profit-chasing and risk-averse are the innate naturals of all investors. One way to achieve these goals is to predict the future stock movement. In the past, time series models such as ARIMA and GARCH are widely used to characterize the trajectory of future stock prices. Nowadays, As the boom of artificial intelligence, [&hellip;]<\/p>\n","protected":false},"featured_media":6691,"template":"","tags":[2998],"insight-category":[690,50],"class_list":["post-14509","insight","type-insight","status-publish","has-post-thumbnail","hentry","tag-stock-predict","insight-category-data-analysis","insight-category-fintech"],"acf":[],"_links":{"self":[{"href":"https:\/\/tejwin20260323.j.webweb.today\/en\/wp-json\/wp\/v2\/insight\/14509","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tejwin20260323.j.webweb.today\/en\/wp-json\/wp\/v2\/insight"}],"about":[{"href":"https:\/\/tejwin20260323.j.webweb.today\/en\/wp-json\/wp\/v2\/types\/insight"}],"version-history":[{"count":1,"href":"https:\/\/tejwin20260323.j.webweb.today\/en\/wp-json\/wp\/v2\/insight\/14509\/revisions"}],"predecessor-version":[{"id":43942,"href":"https:\/\/tejwin20260323.j.webweb.today\/en\/wp-json\/wp\/v2\/insight\/14509\/revisions\/43942"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tejwin20260323.j.webweb.today\/en\/wp-json\/wp\/v2\/media\/6691"}],"wp:attachment":[{"href":"https:\/\/tejwin20260323.j.webweb.today\/en\/wp-json\/wp\/v2\/media?parent=14509"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tejwin20260323.j.webweb.today\/en\/wp-json\/wp\/v2\/tags?post=14509"},{"taxonomy":"insight-category","embeddable":true,"href":"https:\/\/tejwin20260323.j.webweb.today\/en\/wp-json\/wp\/v2\/insight-category?post=14509"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}