2283

network developed in the "learning" process represents a pattern detected in the data.

Thus, in principle, ANN methods can be applied to many research issues such as

those in coastal engineering and oceanography. Theoretically, as long as the training

data set covers the maximum range of the forecasting boundary data, a short-term

data set can be used to train an ANN model for long-term predictions. A trained

neural network can provide a much faster simulation for forecasting long-term events

than traditional hydrodynamic models since its calculation requires no computational

iteration. The implementation of an ANN model is similar to calculating a multiple

variable linear regression function: Output *Y*(*t*) = ANN [*w*1∗*X*1(*t*), *w*2∗ *X*2(*t*)

...*wn*∗*Xn*(*t*)], where *w*i (*i *= 1, ..., n) are the weights of the ANN network, *X*i (*i *=

1, ..., *n*) are input signals, and *Y *is output signal.

The standard gradient-descent training method sometimes suffers from slow con-

vergence due to the presence of one or more local minima. This is generally a charac-

teristic of the particular error surface, which is often composed of several flat and

steep regions. There are, however, several optimization methods, that can be used

to improve the convergence speed and the performance of network training. Details

the conjugated optimization technique is used.

Overfitting is another problem that may occur during neural network training. The

error on the training set is driven to a very small value, but when new data is

presented to the network, the error is large. In this case, the network has memorized

the training examples, but has not learned to generalize to new situations. One useful

approach for improving network generalization is to use an adequately sized network

that is just large enough to provide an adequate fit. The larger a network is, the more

complex the functions that the network can create, which may lead to overfitting. If

a small enough network is used, it will not have enough power to overfit the data.

can prevent overfitting. However, it is difficult to know beforehand just how large

a network should be for a specific application. In general, the optimal network size

to prevent overfitting can be determined through model sensitivity experiments.

In this study, the standard three-layer feed-forward backgropagation network

(Haykin, 1999) with a nonlinear differentiable log-sigmoid transfer function in the

hidden layer (Fig. 5) was employed. The network programming was done using the

Matlab computer software (MathWorks, 1999). Huang and Fu's (2002) study indi-

cates that using an optimized conjugated training method results in improvement of

both training speed and accuracy. In general, the network training speed using conju-

gated training method is about three times faster than when the standard gradient