ASAIL 2019, June 21, 2019, Montreal, QC, Canada Wai Yin Mok and Jonathan R. Mok
def fit(self, X, aL):
self.ansList = np.unique(aL)
self.ws = np.zeros((len(self.ansList), 1+X.shape[1]))
for k in range(len(self.ansList)):
y = np.copy(aL)
y = np.where(y == self.ansList[k], 1, 0)
self.fit_helper(X, y, self.ws[k])
def predict(self, X):
numSamples = X.shape[0]
self.probs = np.zeros((len(self.ansList), numSamples))
for k in range(len(self.ansList)):
self.probs[k] = self.activationProb(X, self.ws[k])
self.results = [""] * numSamples
for i in range(numSamples):
maxk = 0
for k in range(len(self.ansList)):
if self.probs[k][i] >= self.probs[maxk][i]:
maxk = k
self.results[i] = self.ansList[maxk]
return np.array(self.results)
Regarding the three sample court decisions, the parameter w of
the function activationProb is a one-dimensional Numpy array of
1862 integers. w[1:], of size 1861, stores the current weights for the
1861 key words and w[0] is the constant term of the net input z. The
function activationProb rst calculates the net input z by computing
the dot product of the rows of X and w. In the next step the function
calculates the probabilities
ϕ
(z) for the sample sentences (rows) of
X based on the net input z. (Technically, the function calculates
the conditional probability that an input sentence has a certain
sentence type given the sample sentences in X. However, they are
simply called probabilities in this paper to avoid being verbose.) It
then returns the one-dimensional Numpy array phiOfz, which has
the activation probability for each sample sentence in X.
The function t_helper receives three parameters: the same two-
dimensional Numpy array X, the one-dimensional Numpy array y
that stores the correct outputs for the sample sentences of X, and
the one-dimensional Numpy array w that has the weights and the
constant term calculated for the 1861 key words. The function rst
calls activationProb to calculate the probabilities for the sample sen-
tences of X. It then calculates the output for each sample sentence:
1 if the probability
≥
0.5; 0 otherwise. The errors for the sample
sentences are then calculated accordingly. The most important part
of the function is the calculation of the slopes (partial derivatives)
of the current weights, which will be updated in the next line. The
constant term w[0] is updated after that.
Since there are 1861 weights, there are 1861 slopes. The ideal
scenario is that each slope will become zero in the calculation,
which is practically impossible. We thus allow an allowance for such
a small discrepancy, which is stored in self.slopeErrorAllowance.
Thus, the total allowance for 1861 slopes is self.slopeErrorAllowance
* 1861 (= X.shape[1]), which is then stored in stopVal. If the sum of
the absolute values of the 1861 slopes is less than stopVal, it will
exit the for loop and no more updates are necessary. Otherwise, the
for loop will continue for self.n_iter times, a number that is set to
500. Note that the learning rate self.eta is set to 0.0001.
Table 1: Comparison of our implementations and scikit-
learn’s implementation of the logistic regression algorithm
Methods Tests Correct Guesses Total Guesses
Ours 100 8707 15900
scikit-learn’s 100 8454 15900
The function t accepts two parameters: X the two-dimensional
Numpy array, and aL the one-dimensional array that stores the
correct sentence types of the sample sentences of X. It rst nds
the unique sentence types in aL. Then, it applies the one versus
the rest approach for each unique sentence type to calculate the
weights and the constant term for the 1861 key words. To do so,
each appearance of the sentence type in y, which is a copy of
aL, is replaced with a 1 and 0 elsewhere. The function t then
calls self.t_helper to calculate the 1861 weights and the constant
term for that particular sentence type, which will be used to make
predictions.
The calculated weights and constant term for each sentence type
are stored in self.ws, which is a 6
×
1862 Numpy array. To make a
prediction for a sentence, we use the predict function, which calcu-
lates the probability of each sentence type on the input sentence
and assigns the sentence type that has the greatest probability to
the sentence.
4.2.3 scikit-learn’s implementation of the logistic regression algo-
rithm. scikit-learn’s implementation of the logistic regression algo-
rithm can be straightforwardly applied. We rst randomly shue
the 529 sentences and their corresponding sentence types. We then
apply the function train_test_split to split the 529 sentences and
their sentence types into two sets: 370 training sentences and 159
testing sentences. Afterwards, a logistic regression object is created
and tted with the training data.
4.2.4 Discussions. To compare our implementation and that of
scikit-learn for the logistic regression algorithm, we randomly se-
lect 370 training sentences and 159 testing sentences from the 529
sentences of the three sample court decisions and feed them to our
implementation. The results are shown in Table 1, which indicates
that our implementation has more correct predictions than the
highly optimized implementation of scikit-learn. However, since
we are familiar with our code, our implementation can serve as a
platform for future improvements.
Our result is remarkable if we compare our implementation with
randomly guessing the sentence types for the 159 sentences with six
possibilities for each sentence. Note that the probability of randomly
making a correct guess is
p
= 1/6 = 0.166667. Assuming each guess
is an independent trial, making exactly 87 correct predictions and
making 87 or more correct predictions should follow the binomial
distribution. Thus, applying Microsoft Excel’s binomial distribution
statistical functions BINOM.DIST and BINOM.DIST.RANGE to the
problems, we obtain the probabilities in Table 2.
The probabilities for both events are close to zero. Hence, our
keyword approach to capturing the characteristics of the sentences
of breach of contract court decisions is on the right track, although
much improvement remains to be made.