Mathematical modelling of the performance of a student in non-collaborative and non-presential learning

In this paper we propose a model to study the appropriation of knowledge of one student in a non-collaborative online class. We formulate a stochastic model based on the quality of the teacher's class and the affinity of the student to understand the sessions, under the assumption that previous sessions have some influence in the understanding of the next sessions. This assumption implies that the process is not even a Markov process. This kind of situation appears in seminars and classes with many different sessions. We derive some recursive expressions for the distribution of the number of sessions that the student comprehends. Furthermore, we study the convergence of this distribution and study the speed of this convergence through some numerical examples.


Introduction
Simple methods from physics and mathematics have been recently adopted to model, as mathematical metaphors, a wide range of social phenomena and social systems [1,4,6,8,9,11].However, to the best of our knowledge, this kind of models have not been comprehensively used to study the way students learn in a certain class.How the behaviour of students affects how well they learn is an old question and has been widely studied in several contexts since people started to concern about teaching and pedagogy [2,5,7,10].Currently, in the context of the forced transition to online education due to the worldwide health alert, providing an answer to this important issue and other closely related problems could legitimize or give birth to new learning techniques.After this epidemic it can be expected that more capital will be invested to online teaching and blended learning.Even though moving to a total online or blended learning model will be a long process, it is important to study the intrinsic behaviour of students to provide better answers to the challenges ahead.
In this work we proffer to answer the following question: • How does a student, non-interacting with others, understands according to their affinity to the sessions, during a course?
Although general behaviors could be modeled as particles or agents (students) that interact and exchange "something" (knowledge), in the current environment of social distancing; it is a known fact that many students have resented this way of living in "solitude" particularly in the classroom.This way of being in solitude presents new opportunities and challenges when trying to measure how the student understand a class session.In the present work we wish to model the possible appropriation of knowledge, whose easiness varies across the student's understanding throughout the course and the complexity of each session if they do not interact with their classmates.
We assume that the course is presented in a number of sessions such that session j helps the student's understanding of session j + 1.This assumption implies that the given model does not rely on a Markov process [3], because session j depends strongly on all the j − 1 previous sessions.
This work is organized as follows: A general description of the model is given in Section 2. The mathematical manipulation of the model and some results are presented in Section 3. Afterwards, we study some particular cases in Section 4 and present some numerical examples of the mode.Finally, conclusions and future work appear in Section 5.

Description of the Model
We consider the situation in which the course consists of n sessions and the learning process of each student in the same course is independent of the other students.
Under this situation, we assume that the lecturer teaches each session according to a measurable parameter q, which represents the quality of the sessions.Therefore we refer to q as the quality parameter of each session.In this work we only consider the case when this parameter remains constant along the whole course.
The event in which the student understands the first session has a probability given by F (1 − q), for a certain probability distribution F , where F := 1 − F .
From the second session until the end of the course, if the student understood session j, they understand the next one with probability F j+1 (1 − q) := F j (1 − q − ε).Here F 1 refers to some initial distribution F and each F j for j ≥ 2 is constructed conditioned on the result of all the previous j − 1 sessions.
Similarly if the student did not understand session j, they understand the next one with probability The parameter ε is assumed to be positive and fixed during the entire course and it reflects how the comprehension of the content of a session influences the next session.Hence we call ε the dependence parameter.In some courses this dependence parameter will be relatively big in comparison with n, like in a Calculus class.But in many other cases, ε will be relatively small compared to n, like in seminaries, or panoramic courses.
We wish to avoid the situation when the student understands the last sessions of the course with probability 1, due to the cumulative effect of ε after some point, therefore we assume that ε is o(f (n)), for a properly chosen function f depending on the total number of sessions n.
In the following sections we manipulate this model to obtain some results related to the distribution of the number of sessions that the student understands along the course.

Main results
We let Y (j) be Bernoulli random variables defined as follows: For session 1, Y (1) is Bernoulli with parameter F (1 − q) for the given initial continuous distribution F .
For session 2, by the Law of Total Probability we have: Given that the student understood session 1, the probability that they understand session 2 is given by Similarly, if the student did not understand session 1, the probability that they understand session 2 is It follows that (1) is equivalent to For the general setting, we denote by F j the probability distribution such that provided that F j−1 (x − ε) and F j−1 (x + ε) do not equal zero or one.In the equation above, Using this notation we obtain We derive a recursive expression for the probabilities {p m , 1 < m ≤ n}, which is given in the following result.
Theorem 3.1 Let n and ε be such that (n − 1) ε < min {1 − q, q}, then the general expression for {p m , 1 < m ≤ n} reads Proof.The result holds for m = 1 by the definition of p 1 .We proceed by induction, assuming that for an integer k ≥ 1,

By equation (3)
hence we obtain from the induction hypothesis The result now follows.
From this point on, we drop the notation F 1 and write F instead.
We let B n denote the number of sessions, not necessarily consecutive, from a total of n that the student understood.We are interested in the probability function of B n , {P [B n = k] , 0 ≤ k ≤ n}, for which we consider the following particular cases.
• Case 1: n = 3 and k = 0.This is the case when the student understands 0 sessions out of 3. The probability reads: If student understand only one session (i.e.k = 1), the probability is the sum of the following cases.

And Student i understands only the third session
Note that the probabilities in cases 2.1 and 2.2 equal P [B 2 = 0] F (1 − q) and the probability of case 2.3 corresponds to P [B 2 = 1] F (1 − q + 2ε).Hence • Case 3: n = 3 and k = 2.If student understand two sessions (i.e.k = 2), the probability is the sum of the following cases: 3.1 Student i understands the first and second sessions but not the third 3.2 Student i understands the second and third sessions but not the first 3.3 Student i understands the first and third sessions but not the second Note that cases 2 and 3 correspond to P [B 2 = 1] F (1 − q) and the first case equals • Case 4: n = 3 and k = 3.This is the case when student i understands all sessions.This probability is given by The recursive behaviour observed in the probability function of B 3 is generalized in the following theorem.
Theorem 3.2 Let n ≥ 3 be an integer.The probability function of the random variable B n satisfies the following relations. Proof.
1.If the student has not understand the first n−1 sessions from a total of n, it follows from the construction of the model that the student's parameter for understanding the nth-session becomes 1 − q + (n − 2)ε.Hence

Let
A n denote the number of sessions that the student has not understood from a total of n.Then the event {B n = n} is the same as {A n = 0} and hence P [B n = n] = P [A n = 0].Now the result in 1 yields Note that, given the configuration Y (1) = x 1 , . . ., Y (n − 1) = x n−1 in which the student understood exactly k sessions, we have added k times ε to the quality parameter q.Moreover, the maximum number of times we may add or subtract ε in a total of n sessions equals n − 1, since we do not add or subtract anything in session 1. Hence if the student understood k of n sessions, they did not understand n − k and we have subtracted n − 1 − k times ε.This means that understanding session n depends on the parameter It follows that the student does not understand session n with probability F (1 − q − (n − 1 − 2k)ε) .
Since this holds for any given configuration Y (1) = x 1 , . . ., Y (n − 1) = x n−1 , in which the student has understood exactly k sessions, we obtain Analogously The result follows by substituting equations ( 5) and ( 6) in equation ( 4).
Equations in Theorem 3.2 can be written as a single matrix equation.Let B n ∈ M 1,n+1 be given by and denote by M n ∈ M n+1,n the matrix such that Using the notation above we note that where B 0 = F (1 − q) , F (1 − q) .This representation is used for some numerical examples in Section 5.
The explicit distribution of B n is not easy to obtain even in simple cases (such as the case when F is a uniform distribution).Nevertheless, in the following result we provide a simple asymptotic expression for this distribution.
Theorem 3.3 Let {p n (k), k = 0, 1, . . ., n} denote the probability function of a Binomial(n, p) distribution, with p := F (1 − q).Suppose n, ε are such that n 2 ε → 0 as n → ∞ and F is absolutely continuous with density f such that f is continuous at 1 − q, then, Proof.First we prove the case when k / ∈ {0, n}.Following the arguments in the proof of Theorem 3.2, we have where From the hypothesis n 2 ε → 0 it follows that ε → 0. Hence, by the assumption of continuity of F , the following convergences as n → ∞ hold: Note from the matrix representation given in (7) that where (−1) xa , and v 1 (j) = 1 if student understood session j.
We have the following bounds for p n−1,k : Since the terms with the tail F converge to F (1 − q) and their exponents do not depend on n, we only need to prove that or equivalently Using the hypothesis n 2 ε → 0 we may write ε = cn −2−η for some c, η > 0. Applying L'Hôpital's rule we obtain From the limit above and ( 9) it follows that Now let us consider the term Since |U k (n − 1)| = n−1 k , using the result in equation (10), for an arbitrary β > 0 and sufficiently large n we have The result follows by letting n → ∞ and β → 0. The result for the second term in equation ( 8) is obtained analogously.Now we proceed in a similar way to prove that P [B n = 0] is asymptotically equivalent to p n (0).It might be easily checked that Using the representation ε = cn −η−2 and L'Hôpital's rule again, we obtain The proof for P [B n = n] is analogous.
Theorem 3.3 says that for a sufficiently large number of sessions, the dependence becomes less relevant, hence the number of sessions that each student understands behaves like a binomial distribution in which the occurrence of successes is independent.
In this section we present some important quantities in the particular case when the students initial distribution for understanding is uniform in [0, 1].Throughout this section we assume that ε is such that , which we name as Hypothesis 1.
Proposition 4.1 Let F be the uniform distribution over (0, 1) and assume Hypothesis 1 holds.Then for i ≤ n we have Proof.Recall that F 1 = F .The result holds for i = 1, since We proceed by induction, assuming that for an integer j ≥ 1 By the Law of Total Probability and induction hypothesis Hence the result follows.
Theorem 4.1 Let F be the uniform distribution over (0, 1) and assume Hypothesis 1 holds.Then for m ≤ n we have Proof.For 1 ≤ j ≤ m − 1, it follows from Proposition 4.1 that Since F (1 − q) = q, by Theorem 3.1 we have Using the identities the right hand of equation ( 11) becomes Figure 1: In this figure we plot some simulations of B n (dark line) and its approximating binomial distribution (light line).In all the plots we consider ε = 1/n 2 and n = 10 in blue; n = 30 in red; n = 60 in green and n = 100 in purple.Plot (a) is made for q = 0.2; Plot (b) is made for q = 0.5 and Plot (c) for q = 0.8. [2ε] Hence we obtain And the result follows.
As we previously mentioned, an analytic expression for the exact distribution of B n is not easy to obtain even in this simple case when F is the uniform distribution.However, in Theorem 3.3 we have seen that B n behaves asymptotically like a binomial random variable.We present some numerical examples of the performance of this approximation, using different values of n, q and ε.These examples show that the two distributions are very close even for values of n which are not so large.
In Figure 1 we fix the value of q, we simulate the exact distribution and compare it to the approximating binomial distribution.This is made for n = 10, 30, 60, 100 (plots in blue, red, green and purple respectively).The light line corresponds to the approximating binomial distribution while the dark line represents the simulated exact distribution of B n .
As it can be seen in the distinct plots, the convergence to the binomial distribution is quite fast and it grows faster when q = .5.Moreover, it is seen numerically that for any q, when n ≥ 40, P[Bn=k]  pn(k) ≥ 0.95.Our numerical examples show that the speed of convergence depends of the value of q, as it can be seen in Figure 2, where we use different values of q with fixed n and compare the exact distribution of B n to the approximating binomial distribution.
Another point worth mentioning is that the distributions behave symmetrically with respect to q = 0.5.In this case, the convergence to the binomial distribution is faster than in the other cases.In fact, the cases when q is nearly 0 or 1, present a slower convergence to the binomial distribution.This may imply that the dependence is stronger when the quality of the class is low or high.
Figure 2: In this figure we plot some simulations of B n (dark line) and its approximating binomial distribution (light line).In all the plots we consider ε = 1/n 2 and q = 0.1 in blue; q = 0.3 in red; q = 0.5 in green; q = 0.7 in purple and q = 0.9 in yellow.Plot (a) is made for n = 10; Plot (b) is made for n = 30; Plot (c) is made for n = 60 and Plot (d) for n = 100.

Conclusions and Future work
In this work we present a model to study the behaviour of the understanding of a student along several sessions of a course taught totally online and with no interaction between other students.In particular we study the case where the dependence between sessions is relatively small compared to the total number of sessions, as in seminars or panoramic courses.We obtained a recursive expression for the distribution of the number of sessions that the student understands along the course and showed that when the dependence parameter is small, this distribution has a binomial approximation.The speed of convergence of the approximation depends on the number of sessions and the quality of them.Even though this is a simple model, it can be fruitfully extended to consider many more situations, some of which we list below: • The environment we considered assumes the student has no interaction with their classmates.However several studies have shown that collaborative learning provides better results for students.It would be very interesting to modify the model to consider this situation.
• We studied the case when ε is constant and relatively small compared to n, but this may not always be the case, as in some science classes.Therefore it would be useful to consider the cases when ε changes according to the sessions themselves or according to the number of sessions previously understood.
• All the results obtained in this work were made for q constant, but the value of q can vary along the course due to exhaustion and motivation of the student and the teacher.
• Even though the distribution of B n can be approximated by a binomial distribution, we were not able to provide similar results for the corresponding mean and variance.
• It is interesting to test the model with real world data and study or develop some statistical procedures in order for the model to be fitted and validated.