^{1}

^{*}

^{1}

^{*}

Nonnegative matrix factorization (NMF) is a relatively new unsupervised learning algorithm that decomposes a nonnegative data matrix into a parts-based, lower dimensional, linear representation of the data. NMF has applications in image processing, text mining, recommendation systems and a variety of other fields. Since its inception, the NMF algorithm has been modified and explored by numerous authors. One such modification involves the addition of auxiliary constraints to the objective function of the factorization. The purpose of these auxiliary constraints is to impose task-specific penalties or restrictions on the objective function. Though many auxiliary constraints have been studied, none have made use of data-dependent penalties. In this paper, we propose Zellner nonnegative matrix factorization (ZNMF), which uses data-dependent auxiliary constraints. We assess the facial recognition performance of the ZNMF algorithm and several other well-known constrained NMF algorithms using the Cambridge ORL database.

Visual recognition tasks have become increasingly popular and complex in the last several decades as they often involve massively large datasets. Facial detection and recognition tasks are particularly of interest and can be severely complicated due to variation in illumination, emotional expression as well as physical location and orientation of the face within an image. Due to the often massive size of facial image datasets, subspace methods are frequently used to identify latent variables and reduce data dimensionality, so as to produce apposite representations of facial image databases.

Nonnegative matrix factorization (NMF) is a relatively new unsupervised learning subspace method that was first introduced in 1999 by Lee and Seung [

Suppose

or the Kullback-Leibler Divergence

Many authors have adapted the NMF algorithm by altering either the cost function formulation [

Here J_{1}(W) and J_{2}(H) represent the penalty terms and 0 ≤ α ≤ 1 and 0 ≤ β ≤ 1 are the regularization parameters that specify the relationship between the constraints. Often a sparsity constraint and an approximation error constraint are used.

Though there are many adaptations of the NMF algorithm in which auxiliary constraints are imposed on W and H, none of these methodologies make use of data-dependent penalties. Inspired by the so-called Zellner’s g-Prior [

NMF factorizes a matrix

where

and

where

Traditional NMF using a multiplicative update is known to be slow to converge as it requires a large number of iterations. Gradient descent and alternating least squares algorithms are commonly used in place of traditional NMF as they require far fewer iterations resulting in a faster convergence; however, we will not explore them in this paper.

The standard NMF multiplicative updating algorithms have a continuous descent property. The descent will lead to a stationary point within the region under examination; however, it is uncertain as to whether or not this stationary point is a local minimum as it could certainly be a saddle point. This is due to the iterative optimization nature of the algorithm which optimizes W and H iteratively, though never simultaneously.

CNMF [_{1}(W) and J_{2}(H) that serve to apply task-specific, auxiliary constraints on the solutions of (3). 0 ≤ α ≤ 1, 0 ≤ β ≤ 1 are regularization parameters. For our purposes we define J_{1}(W) and J_{2}(H) from (3) as follows:

and

while α and β are used as constraints on the sparsity and approximation error respectively. When the optimization task is that of (3), the multiplicative updates of (4) and (6) are modified as follows:

and

In regression analysis, for a Gaussian Distribution with

has variance

The Zellner g-Prior exploits this fact, in the Bayesian setting to use

which corresponds to using the penalty’s empirical risk shown below:

We extend and adapt Zellner’s ideas as follows:

where S = (X^{T}W) is n × q, S^{T} is q × n and S^{T}S is q × q.

where R = XH^{T} is p × q and essentially represents the projection weighting. R^{T}R is q × q and its diagonal essentially represents the idiosyncratic variance of the projections onto the lower dimensional space.

As can be seen in (17) and (18), the updates of both W and H are simply post or pre-weighted by the input space variances or the data spaces variances.

Our objective function is

where

and

where

For (22) the Risk Inflation Criterion (RIC) of Foster and George [^{2} is combined with the Bayesian Information Criterion (BIC) to produce the so-called benchmark prior as g = n leads to the unit information prior found in BIC. And so, (22) will be found to be appropriate.

When using ZNMF, the updating equations of CNMF shown in (10) and (11) are modified as follows:

and

In this section, we conducted a series of simulations to evaluate the classification performance of the ZNMF and CNMF algorithms. We replicated the ORL classification experiment conducted in Wang et al. [

The Cambridge ORL database consists of 10 gray-scale facial images each of 36 male and 4 female subjects. The images vary in illumination, facial expression and position. The faces are forward-facing with slight rotations to the left and right. For each simulation, the training dataset X Î ℝ^{644×200} was produced by randomly selecting 5 images from each of the 40 subjects resulting in a training dataset of 200 images of 644 pixels each. The test datasets were comprised of the remaining 200 unselected images and were used to evaluate the facial recognition capabilities of CNMF and ZNMF using the first Nearest Neighbor classifier. In order to optimize the computational efficiency the resolution of the images was reduced from 112 × 92 to 28 × 23 in accordance with [

The effects of the α, β, and g-Prior parameter settings on the average recognition rate were explored to great lengths through extensive computer simulations. We restricted the α, β relationship to two possible scenarios:

and

such that 0 ≤ α ≤ 1, 0 ≤ β ≤ 1. Optimal α and β settings were determined across all considered factorization ranks q Î {16, 25, 36, 49, 64, 81, 100} for the CNMF algorithm simulations using (25) and (26) (see

Factorization Rank (q) | α and β Relationship | Optimal α | Optimal β | Average Recognition Rate |
---|---|---|---|---|

16 | α = β | 0.81 | 0.81 | 0.88578 |

α = 1 − β | 0.15 | 0.85 | 0.88602 | |

25 | α = β | 0.64 | 0.64 | 0.88631 |

α = 1 − β | 0.39 | 0.61 | 0.88698 | |

36 | α = β | 0.56 | 0.56 | 0.88630 |

α = 1 − β | 0.82 | 0.18 | 0.88347 | |

49 | α = β | 0.99 | 0.99 | 0.88685 |

α = 1 − β | 0.82 | 0.18 | 0.88556 | |

64 | α = β | 0.42 | 0.42 | 0.88838 |

α = 1 − β | 0.46 | 0.54 | 0.88561 | |

81 | α = β | 1.00 | 1.00 | 0.88868 |

α = 1 − β | 0.14 | 0.86 | 0.88460 | |

100 | α = β | 0.94 | 0.94 | 0.88901 |

α = 1 − β | 0.12 | 0.88 | 0.88496 |

Factorization Rank (q) | α and β Relationship | Optimal α | Optimal β | Optimal g-Prior | Average Recognition Rate |
---|---|---|---|---|---|

16 | α = β | 0.45 | 0.45 | 75 | 0.90283 |

α = 1 − β | 0.60 | 0.40 | 80 | 0.90080 | |

25 | α = β | 0.45 | 0.45 | 75 | 0.89975 |

α = 1 − β | 0.60 | 0.40 | 80 | 0.90200 | |

36 | α = β | 0.45 | 0.45 | 75 | 0.90039 |

α = 1 − β | 0.60 | 0.40 | 80 | 0.89933 | |

49 | α = β | 0.45 | 0.45 | 75 | 0.90093 |

α = 1 − β | 0.60 | 0.40 | 80 | 0.89952 | |

64 | α = β | 0.45 | 0.45 | 75 | 0.90055 |

α = 1 − β | 0.60 | 0.40 | 80 | 0.90219 | |

81 | α = β | 0.45 | 0.45 | 75 | 0.90024 |

α = 1 − β | 0.60 | 0.40 | 80 | 0.89918 | |

100 | α = β | 0.45 | 0.45 | 75 | 0.90111 |

α = 1 − β | 0.60 | 0.40 | 80 | 0.90012 |

more regularization parameter settings to consider for the ZNMF algorithm than the CNMF algorithm, α and β were optimized exclusively at a factorization rank q = 16 for (25) and (26) in the ZNMF simulations. The optimal tuning values of α and β for the ZNMF simulations, were then held constant across the remaining factorization ranks (25, 36, 49, 64, 81, 100), only differing depending upon the relationship of α with β specified by (25) and (26). There were 20 replications used at each unique setting of the regularization parameters for the CNMF algorithm; while only 5 replications were used at each unique setting of the regularization parameters in the ZNMF simulations. The noticeable difference between the number of replications for the CNMF and ZNMF algorithms was again due to the fact that there were far more parameter settings to explore using ZNMF than CNMF.

The recognition performances of the ZNMF simulations across various settings of α, β, and the g-Prior are on display in

shown in the right of

After identifying optimal settings for the regularization parameters, 500 replications were conducted for both the CNMF and ZNMF algorithms, across the factorization ranks q Î {16, 25, 36, 49, 64, 81, 100}, using the optimal parameter settings. The results, provided in

In this paper, we proposed the ZNMF algorithm for the assessment of facial recognition and assessed its capability in this regard using the Cambridge ORL Faces Database. We compared its facial recognition capabilities with traditional NMF and several constrained version of NMF across seven different factorization ranks. We found that ZNMF algorithm outperformed the other algorithms across the majority of the factorization ranks, most notably at the lower factorization ranks where the margin of improvement was the most significant. The FNMF algorithm approximately tied the facial recognition rate of the ZNMF algorithm at three factorization ranks (49, 64 and 81) and the PNMF algorithm approximately tied the ZNMF algorithm at just one factorization rank (49). Quite possibly the most important finding was that the ZNMF algorithm produced facial recognition

rates, using less information (lower factorization ranks), that either out-performed or were comparable to the results of other algorithms at higher factorization ranks. This finding implied that, for the ORL Dataset, the data-dependent ZNMF algorithm could classify facial images better than the other algorithms under examination and it could do so with less information, making it computationally less taxing.

This paper demonstrates the advantages of including data-dependent auxiliary constraints in the NMF algorithm through the introduction of ZNMF. In the future, we hope to explore other data-dependent auxiliary

constraints. A possibility would be to use ^{T} is n × n and is some-

what the linear Gram matrix of the projected X. RR^{T} is p × p and somewhat mirrors the covariance matrix in input space. We hope to explore these auxiliary constraints in the near future, again using the Cambridge ORL database and perhaps the Facial Recognition Technology (FERET) database as well.

Ernest Fokoué wishes to express his heartfelt gratitude and infinite thanks to our lady of perpetual help for her ever-present support and guidance, especially for the uninterrupted flow of inspiration received through her most powerful intercession.

Matthew A.Corsetti,ErnestFokoué, (2015) Nonnegative Matrix Factorization with Zellner Penalty. Open Journal of Statistics,05,777-786. doi: 10.4236/ojs.2015.57077