The choice of appropriate statistical technique is very important in any research work as inability to choose appropriately can lead to false conclusions.
I have discovered from experience that many researchers gather data and then are at loss for a sensible method of analysis. Hence the need to learn how to choose the right statistical tool for analyzing a particular set of data
Choosing appropriate statistical techniques depends mainly on the research hypotheses and the design of the study as every research design is a plan which directs the researcher on how to collect data and analyze the data collected.
It should be borne in mind that the complexity of any statistical tool is not an indication of its appropriateness or importance.
So, a simple statistical technique may be more appropriate than a complex one. For instance, don’t use ANOVA statistics where t-test is adequate.
Types of Data/Scale of Measurement
You must understand that the nature of the measurement process that produces the statistical data determines the interpretation that can be made from them and the statistical tools that can be used with them!
The most widely used taxonomy of measurement procedures is Steven’s “ Scale of Measurement” in which he classifies measurement as nominal, ordinal, interval and ratio.
Before you choose a statistical technique for your data analysis, you must first determine the scale of measurement in which his data falls or the type of data he has collected. This is because different statistical techniques lend themselves to different types of data.
Let look at each of the data type.
Nominal Data– These are data which classify some attribute are generated by nominal scale
- They may be coded as numbers but the numbers have no real meaning.
- They are just label without any default order.
- They are the lowest level of measurement and these types of data are the most restricted in how they can be analyzed.
- A very common way of summarizing nominal data is by reporting the frequency in the form of proportion or percentage in each of the categories.
- The measure of central tendency that can be used is the mode
Examples of variables that will generate nominal data;
- Gender-male or female,
- Class – Level 1, Level 2 or Level 3,
- Education Zones- Nasarawa, Dala and Bichi,
- Colour -Blue, Red or Yellow
Ordinal Data–these are data can be put in an order and are generated by ordinal scale
- They don’t have a numerical meaning beyond the order
- The differences between adjacent measures may not be equal
Examples of ordinal data
- Questionnaire responses coded 1= strongly disagree, 2= disagree, 3=neutral,4=agree and strongly agree.
- Level of pain felt in joint rated on a scale from 0(comfortable) to 10(extremely painful)
Interval Data–these are numerical data where the distances between numbers have meaning but the zero has no true meaning.
Examples of Interval data
- Marks(score) obtained by three students as 90%,80%,40% in descending order shows that there is an order of magnitude and the exact differences between the intervals are known.
- 0% does not show that the students possesses zero achievement level in that subject and 100% in the test cannot show complete knowledge of of that subject. In the same vein, the second student cannot be said to have twice the intelligence of the third student because he has 80 marks
Ratio Data– these are numerical data where the differences between data and the zero point have real meaning. These types of data are commonly used in physical sciences than behavioural sciences.
- The least restricted in how they can be analyzed
- All types of statistical procedures are appropriate with the ratio scale
Examples of Ratio data
- Hieght of students
- Weight of students
- temperature of a body
The four levels of measurement are very important for analyzing the results of a research study. At the lower levels of measurement, assumptions tend to be less restrictive and data analyses tend to be less sensitive. At each level up the hierarchy, the current level includes all of the qualities of the one below it and adds something new. In general, it is desirable to have a higher level of measurement (e.g., interval or ratio) rather than a lower one (nominal or ordinal).
In SPSS, you will find the nominal, ordinal and Scale(representing the interval and ratio data) useful when entry data for the analysis.
- It is only the last two data types (Interval and ratio) that might be suitable for parametric methods, although as we will see later it is not always a completely straight forward decision.
- It should be noted that if an interval/ratio variable such as age is grouped into categories such as 10-19,20-29,30-39,40-49 and so on it becomes an ordinal variable.
- When documenting research, it is reasonable to justify the choice of analysis to prevent the reader believing that the analysis that best supported the hypothesis was chosen rather than the one most appropriate to the data.
Parametric or Non-parametric data
Another important step in choosing statistical tool for data analysis is for you to determine or address the issue of whether your data are parametric or not.
Ascertaining whether a set of data are parametric or non-parametric can be subtle and complex. The guideline is for you to remember the important rule which is not to make unsupported assumptions about the data.
Don’t assume that your data are parametric and because someone (2017) used a t-test so you will use t-test, rather, it will be better to test the data for “normality”.
You should not assume that because you are given a small sample that it will be sensible to opt for non-parametric tests. Ensure you conduct a parametric test to ascertain otherwise before opting in for non-parametric test.
- Rank scores or categories are generally non-parametric data
- Measurements that comes from a population that is normally distributed can usually be treated as parametric.
If still in doubt treat your data as non-parametric especially if you have a relatively small sample.
On a general note, parametric data are assumed to be normally distributed. The normal distribution is a data distribution with more data values near the mean, and gradually less far away, symmetrically.
To reasonably apply parametric tests, the data should be normally distributed. But if you are unsure about the distribution of the data in your target population then it is safest to assume the data are non-parametric.
Test such as t-test which depend on the assumption about distribution of the underlying population data as well as tests for the significance of correlation involving Pearson’s product moment correlation coefficient (PPMC) which also assume that the data being testes come from a normally distributed population are parametric.
On the other hand, tests that do not depend on many assumptions about the underlying distribution of the data are called non-parametric. These tests include Wilcoxon signed rank test, Mann-Whitney test and Spearman’s correlation coefficient. They are mostly used to test small samples of ordinal data.
- Are your data paired?
In your investigation you must do one of the following before choosing the appropriate statistical tool for your analysis;
Paired data could be generated when there is a before(pre-test) and after treatment result(post-test) or when a student’s score in two subjects is been considered.
In such situation, each research subject or participant would have a pair of measurements and it might be that you want to look for a difference in these measurements to show an improvement due to the treatment.
- Paired data are also known as “related samples”
- Non-paired data can also be referred to as “independent samples”
- Scatterplots (also called scattergrams) are only meaningful for paired data
- Are you looking for cause and effect or differences or relationship….?
It also very necessary before choosing a statistical test to apply to your data is what you are actually looking for as the purpose of your investigation.
- You can look for differences whenever you have two sets of data
- You may also look for correlation when you have a set of paired data, i.e. two sets of where each data point in the first set has a partner in the second.
- You might as well look for the difference in some attribute before and after some intervention.
- You might want to look at the following descriptive statistics
|WHAT DO YOU WANT TO KNOW DESCRIPTIVELY?||STATISTICS TEST TO USE|
|1.Do you want to know how many students checked each answer||Frequency|
|2.Do you want to know the proportion of students who answered in a certain way?||Percentage|
|3.Do you want the average of the test score?||Mean|
|4.Do you want to show the degree to which a score varies from the mean?||Standard Deviation|
|5.Do you want to compare one group to another?||Cross Tab.|
|6. Do you want the middle score in a set of scores?||Median|
|7..Do you want to show the spread or range in scores?||Range|