ANSWERS: 2
-
The median is the 'middle value' from a list of values, the mean is the total of all the values divide by the number of values and the mode is the most common value
-
In probability theory and statistics, a median is described as the number separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, the median is not unique, so one often takes the mean of the two middle values. At most half the population have values less than the median and at most half have values greater than the median. If both groups contain less than half the population, then some of the population is exactly equal to the median. For example, if a < b < c, then the median of the list {a, b, c} is b, and if a < b < c < d, then the median of the list {a, b, c, d} is the mean of b and c, i.e. it is (b + c)/2. The median can be used when a distribution is skewed, when end values are not known, or when less importance is attached to outliers, e.g. because they may be measurement errors. A disadvantage is the difficulty of handling it theoretically. The sample median: Efficient computation of the sample median: Even though sorting n items takes in general O(n log n) operations, by using a "divide and conquer" algorithm the median of n items can be computed with only O(n) operations (in fact, you can always find the k-th element of a list of values with this method; this is called the selection problem). Easy explanation of the sample median: For an odd number of values: As an example, we will calculate the median of the following population of numbers: 1, 5, 2, 8, 7. Start by sorting the numbers: 1, 2, 5, 7, 8. In this case, 5 is the median, because when the numbers are sorted, it is the middle number. For an even number of values: As an example of this scenario, we will calculate the median of the following population of numbers: 1, 5, 2, 10, 8, 7. Again, start by sorting the numbers: 1, 2, 5, 7, 8, 10. In this case, both 5 and 7, and all numbers between 5 and 7 are medians of the data points. Sometimes one takes the average of the two median numbers to get a unique value ((5 + 7)/2 = 12/2 = 6). In statistics, mean has two related meanings: - the arithmetic mean (and is distinguished from the geometric mean or harmonic mean). - the expected value of a random variable, which is also called the population mean. It is sometimes stated that the 'mean' means average. This is incorrect if "mean" is taken in the specific sense of "arithmetic mean" as there are different types of averages: the mean, median, and mode. Other simple statistical analyses use measures of spread, such as range, interquartile range, or standard deviation. For a real-valued random variable X, the mean is the expectation of X. Note that not every probability distribution has a defined mean (or variance); see the Cauchy distribution for an example. For a data set, the mean is the sum of the observations divided by the number of observations. The mean of a set of numbers x1, x2, ..., xn is typically denoted by , pronounced "x bar". The mean is often quoted along with the standard deviation: the mean describes the central location of the data, and the standard deviation describes the spread. An alternative measure of dispersion is the mean deviation, equivalent to the average absolute deviation from the mean. It is less sensitive to outliers, but less mathematically tractable. As well as statistics, means are often used in geometry and analysis; a wide range of means have been developed for these purposes, which are not much used in statistics. These are listed below. In statistics, the mode is the value that occurs the most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score. Like the statistical mean and the median, the mode is a way of capturing important information about a random variable or a population in a single quantity. The mode is in general different from the mean and median, and may be very different for strongly skewed distributions. The mode is not necessarily unique, since the same maximum frequency may be attained at different values. The most ambiguous case occurs in uniform distributions, wherein all values are equally likely. Mode of a sample: The mode of a data sample is the element that occurs most often in the collection. For example, the mode of the sample [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] is 6. Given the list of data [1, 1, 2, 4, 4] the mode is not unique, unlike the arithmetic mean. For a sample from a continuous distribution, such as [0.935..., 1.211..., 2.430..., 3.668..., 3.874...], the concept is unusable in its raw form, since each value will occur precisely once. The usual practice is to discretize the data by assigning the values to equidistant intervals, as for making a histogram, effectively replacing the values by the midpoints of the intervals they are assigned to. The mode is then the value where the histogram reaches its peak. For small or middle-sized samples the outcome of this procedure is sensitive to the choice of interval width if chosen too narrow or too wide; typically one should have a sizable fraction of the data concentrated in a relatively small number of intervals (5 to 10), while the fraction of the data falling outside these intervals is also sizable. An alternate approach is kernel density estimation, which essentially blurs point samples to produce a continuous estimate of the probability density function which can provide an estimate of the mode. An algorithm for computing the mode of a sample, using the language of MATLAB to express the method, requires as a first step to sort the sample in ascending order, compute the discrete derivative of the sorted list, find the indices where this derivative is positive, followed by computing the discrete derivative of this set of indices, locating the maximum of this derivative of indices, and finally evaluating the sorted sample at the point where that maximum occurs, which corresponds to the last member of the stretch of repeated values.
Copyright 2023, Wired Ivy, LLC