Development of an Artificial Neural Network to Interpret Froth Images from a Copper Flotation Process

Vladimir M. Rakocevic
and
John A. Meech
Department of Mining and Mineral Process Engineering
University of British Columbia
Vancouver, British Columbia

Abstract

In this project an artificial neural network has been used to interpret froth images in a copper flotation process. As there is often a lack of adequate process instrumentation for control of flotation, a neural network can be combined with an image analysis system to classify froth types based on features extracted from such images. Output can be used to indicate good or bad working conditions.

This paper describes the network structure and learning approach used to setup the system. The learning cycle employed 33 data sets allowing rapid restructuring of network weights. An example is given illustrating the capabilities of the developed network and the consideration of real-time applications.

Background

Work with neural network rule-based systems, which can be used to simulate expert performance, can be divided into two areas. The first type, exemplified by the work of Touretzky and Hinton [3], simulates the firing of rules, as in production systems. They consider the problem of pattern-matching and variable-binding within the connectionist paradigm; operations which are critical in implementing a standard AI production system.

The second type of system [2], uses a neural network learning algorithm, such as error back-propagation, to learn relationships between inputs and outputs; such relationships are normally considered expert knowledge. Saito and Nakano (1988) applied a connectionist network to a problem in medical diagnosis that has traditionally been an area for rule-based systems. They used a 3-layer feed-forward network to represent the diagnostic information. It had 216 input units, divided into small sets; each set corresponding to a question. For each possible response to a question, one of the input units in the set corresponding to that question was activated. The middle layer consisted of 72 hidden units; the output layer used 23 units corresponding to the 23 diseases contained within the system. Each of the layers was connected to each unit in the layer above it. The back-propagation learning algorithm was used to train the system [1].

In many copper flotation concentrators, there is often a lack of adequate process instrumentation. Only in certain plants will one find particle size monitoring, flotation feed density gauges, flotation pH meters, and on-stream assays of intermediate process streams. Furthermore, a flotation operator often has limited freedom to vary such variables as flotation feed density since to prevent plugging, the tailing pulp density must be kept at a level different than maybe optimal for flotation. As a result, over years of practice, many skilled operators develop an operating strategy based on observations of froth conditions. An experienced operator can tell when too much Pine Oil is being added or less air is needed at the scavengers by simply examining froth characteristics. Conventional philosophy is to maintain optimum froth conditions and then fine-tune reagents to increase recovery. It can take many years for a novice operator to learn how to "read" the froth and attain proficiency in the art of flotation [5].

So, there is a need for automated approaches, particularly ones which are adaptable and able to learn from past data or recent experience.

Copper Flotation and Froth Images

In the froth images used in this work, experienced flotation plant operators recognize eight types of froth:

Figure 1 shows the 8 types of froth. While these classifications are probably site-specific depending on ore type, company culture and language, it is believed that many mills around the World use a number of these descriptions or variations thereof to control their specific plant.

Each froth type is characterized by 6 main features:

An Optimum froth has recovery bubbles as large as an egg with large and strong windows; it will possess a high degree of mineralization and a bright metallic colour; it will run steadily. The remaining froth types are defined in a similar fashion. Table 1 gives typical feature descriptions.

Based on the froth type observed, different responses are implemented such as: reduce collector, increase frother, decrease air flow, check for oil spills, etc.

Table 1: Typical Feature Descriptions

Motivation and Aim of this Work

We wanted to create an ANN that would perform as well as an expert operator at interpreting froth characteristics. The main advantage of a neural network approach is that learning or adaptation can be done with relative ease. On the basis of this advantage, a system is able to change to meet new circumstances.

For example, 2l networks can be run in parallel: one in off-line and one in on-line mode. When we introduce irregularities or new working conditions into the process requiring retraining, we can easily switch between the on-line and off-line networks. The former off-line system will now monitor the process while the previous on-line system receives training about the new data in an off-line mode.

In addition, we can build a fuzzy neural network to allow introduction of ambiguity in the definition of working conditions. For example, the colour of the froth may be seen as something between dark brown and dark gray. The operator is not certain about the exact colour so belief in the feature of the froth image can be input as well: "I am 60% sure it is dark brown and 40% sure it is dark gray".

Selecting a Network Structure

The first step in developing a system is to select a network architecture that encompasses the requirements of the application. Our problem is one of Classification for which Saito and Nakano recommend a feed-forward 3-layer network with a back-propagation learning algorithm[2]. More complex architectures are available but the need for such complexity is not obvious. For real-time applications, simplicity is probably the best approach.

The number of nodes in the input layer is determined by the total number of input variables and the discrete terms used to describe each input. The number of nodes in the output layer is determined by the number of output variables and their discrete descriptions. The hidden layer is designed using a heuristics.

The Back-Propagation Learning Algorithm

Back-propagation compares the network response to the desired response, and adjusts the network weights so that when the same input is presented to the network, the output will be closer to the desired value. Back-propagation is the predominant method of supervised training which requires a reliable set of input/output facts and a good sense of the topology appropriate to the problem.

Figure 2 shows a partial structure of a 3-layer network. All input connections are summed and then passed through a Sigmoid threshold function to produce output. This function serves to allow differentiation of the output signal for the back-propagation technique to compute delta-values for each connection.

The process by which a network runs and learns is as follows:

1. Compute the hidden-layer neuron activations:

H = F( I x W1)

where H is the vector of hidden-layer neurons, I is the vector of input-layer neurons and W1 is the weight matrix for input layer to hidden layer connections.

2. Compute the output layer neuron activations:

O = F( H x W2)

where O represents the output-layer vector, H is the hidden-layer vector and W2 is the matrix of synaptic weights connecting the hidden and output layers. In both equations, F( ) is the Sigmoid activation function ( 1 + e-x ) -1 where x is I x W1 or H x W2 respectively for the hidden and output layers.

3. Compute the derivative of the output-layer error (the difference between the target and the observed output):

d = O x ( 1 - O ) x ( O - T )

where d is the derivative of the error vector for each output neuron, O is the output-layer vector and T is the target activation vector of the output-layer.

4. Adjust the weights of the output layer of synapses:

W2(k) = W2(k-1) + A2(k)

where A2 is a matrix representing the change in matrix W2, computed as follows:

A2(k) = L x H x d + M x A2(k-1)

L is the learning rate affecting the degree to which this dataset changes the weights and M is a momentum factor used to allow the previous weight change to influence the weight change in this cycle, k. These factors can be varied from 0 to infinity but more reasonably should range from 0 to 1.0 with the sum of the two equal to 1.0 (this is not strictly necessary but is a sensible approach).

5. Compute the hidden-layer error differential in much the same way:

e = H x (1 - H) x W2(k-1) x d

where e is the differential of the error vector for each hidden-layer neuron.

6. Adjust the weights for the first layer of synapses:

W1(k) = W1(k-1) + A1(k-1)

where A1(k) = L x I x e + M x A1(k-1)

Repeat steps 1 to 6 on all I/O patterns until the output-layer error (vector d) is within a specified tolerance for each pattern and for each neuron. The values of L and M can be set to different levels for the input and hidden layers if desired.

The input and output layer sizes depend on the application (the numbers of input and output factors). The hidden-layer size, though, is unspecified. A good "rule of thumb" is for this layer to be somewhere in-between the input-layer size (which is generally larger) and the output-layer size (which is possibly quite small).

The back-propagation network has the ability to learn any arbitrarily complex nonlinear mapping due to the introduction of the hidden-layer. It also has a capacity much greater than the dimensionality of its input and output layers as interpolation between input and output training sets is an important attribute of such networks. This means that even with limited training data, a network can be used to identify patterns not presented during training.

Unfortunately, back-propagation can involve extremely-long, potentially-infinite training times. If there are strong relationships between inputs and outputs and you are willing to accept results with a relatively broad tolerance, training time may be reasonable. However, for applications where the relationships are subtle and where predictions must be relatively accurate, training may take several days[6]. Obviously hardware dependency is also an important consideration.

Collecting and Processing Data for the Network

There are two possible approaches to this particular issue. The number of nodes in the input and output layer will be determined by which approach is selected. The input data must contain descriptions of froth images. The desired output will be the particular type of froth represented by the input data.

Each of the main characteristics that define a froth type has several values. We can process these descriptions in two ways:

First Approach:

We can have as many input nodes as there are main characteristics. In our problem we have 6 different main features that describe each type of froth. Each feature then can be scaled (0-10, or 0-1) and a specific number assigned to represent each input value, say from low to high, or small to large, etc.

For Example: Feature - bubble size

By describing the recovery bubbles as small, we will activate the node corresponding to "size of the bubbles" with a value 0.4.

Second Approach:

Another approach is to assign one node to each distinct value. In other words, bubble size will have 5 nodes. If it is described as "small", only that node corresponding to "small" in the "bubble set" will be activated. This approach demands a "bigger" and more complex network with more connections but also provides a more discretized structure.

The number of output nodes depends on how many distinct froth types there are. In our case, 8 different froth types were examined.

Table 2 shows the number of nodes in each layer for both approaches.

Table 2: Nodes and Connections for each Network Approach.

The number of hidden nodes in approach 2 was determined by trial and error starting with 10 nodes and examining the error and time to learn. The system failed to perform successfully with less than 11 nodes and no significant improvement in accuracy was detected above 15.

Training and Testing the Neural Network

First Approach:

This approach, at first examination, would appear to be preferred because it is less complex. However, on running this system, we found it unable to achieve better than 84 % accuracy even with extended learning times. As a result, we abandoned this technique. Using a larger number of hidden neurons might have improved the accuracy of this method.

Second Approach:

The training data used included 33 samples, i.e., 33 descriptions of froth images defined by an experienced operator. On each iteration, a single data set was presented to the network on a random selection basis. It took a total of 15000 iterations for the designed neural network to learn these 33 samples. A 486 DX2-66MHz microcomputer processed this learning in 2 minutes for the second network architecture. We used 5 % as our acceptable error level for learning.

When studying the learning coefficient and momentum value, we noticed that for fast and efficient learning, the momentum value should be less than or equal to the learning coefficient, and should be reduced for the output layer. Our best performance was obtained with L = .7 and M = .3 for the output layer and L = .6 and M = .4 for hidden layer. This choice may depend on the degree of noise contained within the training data.

We used 70 datasets to test the network, the original 33 used for training plus 37 previously unseen samples. For the training data, the average error per output neuron was 2%. The network response was 93% accurate for the previously unseen data. The highest error in one case was 16% per output node. However, even in this case, if the network selected the neuron with the highest output strength, the correct froth type was chosen.

We took care not to train and test the designed neural network with data inputs that are mutually exclusive. You can't have small recovery bubbles with large and strong windows. That type of froth does not exist. In our opinion this is analogous to teaching children to add 2+5 and providing them with answers such as: it is 7, but it is not 6, and it is not 8, etc. We found that a more complicated architecture and greater learning time is needed to teach the network such exceptions. There are 12,600 possible input combinations. Often input data are not strictly independent facts, so to avoid problems with misinterpreted data entry, there are several possible solutions:

The designed neural network showed a high degree of generalization. It would be useful to test the network with some of the "not existing" descriptions of froth types, as inexperienced operators may often provide faulty measurements. The unseen data used to test our network included several examples of such "noisy" data, with the greatest noise level being 33 % per node.

On the basis of the above performance, we believe the second approach is the most appropriate design for such a neural network. As final support for why this solution is better than a network with fewer input nodes, should a fuzzy-neural network be desired with input data that is not strictly classified, the more discrete system can be used to introduce a degree of belief in each value for each property type. The first approach does not permit this entry of uncertainty.

Building the Application (User I/O Routine)

The network was designed and implemented using the Neural Works Professional II/Plus software package. One of the most powerful features within this package is its User I/O facility [4]. All I/O programs are user-written "C" language programs that interact with NeuralWorks. User I/O allows complete control over the data presented to and the results returned from the network.

The designed I/O routine prompts an operator to answer questions describing the froth image. The first question is:

"Describe recovery bubble size (press 1,2,3,4 or 5)":

  1. very small
  2. small
  3. egg size
  4. large
  5. baseball size

The answer provides input for the first set of nodes in the input layer. Pressing 1 means node #1 will be activated, etc. All other froth characteristics are handled in a similar way. Once all inputs are known, the result is taken from the neural network and processed through the I/O routine to give an answer such as:

"Described froth type is optimum froth".

There is complete freedom to present output from the neural network with the user I/O routine. Our system finds the maximum value of the nodes in the output layer, and if this is greater than a threshold (i.e. 0.5), the corresponding node gives the answer of the froth type. Alternatively the system can be designed to provide output from the network as degrees of belief that certain descriptions belong to one or more types of froth, which can then be post-processed in an accompanying real-time diagnostic or control expert system.

Conclusion

The designed neural network performs this classification problem very well. It learns to recognize froth types very fast and also shows a high degree of generalization for real unseen descriptions.

There is a possibility to introduce uncertainty and fuzziness in the data input and to present output as a network belief that some descriptions correspond to certain froth types.

The next phase of this project will examine the automated input of data from video images from a TV camera mounted above a flotation cell. This will avoid the entry of misinterpreted data by novice operators.

Acknowledgment

The Natural Sciences and Engineering Research Council of Canada is acknowledged for financial support of this work through Operating Grant 45717. Professor M.P. Beddoes of the UBC Electrical Engineering Department, is acknowledged for his assistance and support throughout this project. Hans Raabe of Highland Valley Copper, is thanked for providing the froth images.

References

1. Zeidenberg, M., 1990. "Neural Networks in Artificial Intelligence", Ellis- Horwood Limited, Chichester, W. Sussex, England.

2. Saito, K. & Nakano, R., 1988. "Medical Diagnostic Expert System Based on PDP Model" Proc. IEEE Inter. Conf. on Neural Networks, San Diego, CA.

3. Touretzky, D.S. & Hinton, G.E., 1988. "A Distributed-Connectionist Production System" Cognitive Science, 12(3), 423 - 466.

4. NeuralWare Inc., 1991, "An Extended Tutorial for NeuralWorks Professional II/Plus and NeuralWorks Explorer", NeuralWare Inc., Pittsburgh, Penn.

5. Benford, P.M. & Meech, J.A., 1992, "Advising Flotation Operators Using a Real-Time Expert System", Minerals Engineering, 5(10-12), pp. 1325-1331.

6. Blum, A., 1992. "Neural Networks in C++", Wiley Prof. Comp., New York, NY.

Appendix 1

Appendix 2


Top of page

Return to the publications index


These pages created by Aruna Sood,
maintained by S. Finora (smf@mining.ubc.ca)