The practice of statistics 5th edition pdf chapter 1

Data Sets


back to top

Errata for The Practice of Statistics


back to top

FRAPPY! Student Samples


back to top

HP Prime Technology Corners


back to top

Statistical Applets


back to top

Technology Corner Videos


back to top

Worked Example Videos


back to top

Worked Exercise Videos


back to top
  • Chapter 1: Exploring Data
  • Chapter 1.1: Analyzing Categorical Data
  • Chapter 1.2: Displaying Quantitative Data with Graphs
  • Chapter 1.3: Describing Quantitative Data with Numbers
  • Chapter 10: Comparing Two Populations or Groups
  • Chapter 10.1: Comparing Two Proportions
  • Chapter 10.2: Comparing Two Means
  • Chapter 11: Inference for Ditribution of Categorical Data
  • Chapter 11.1: Chi-Square Tests for Goodness of Fit
  • Chapter 11.2: Inference for Two-Way Tables
  • Chapter 12: More About Regression
  • Chapter 12.1: Inference for Linear Regression
  • Chapter 12.2: Transforming to Achieve Linearity
  • Chapter 2: Modeling Distributions of Data
  • Chapter 2.1: Describing Location in a Distribution
  • Chapter 2.2: Density Curves and Normal Distributions
  • Chapter 3: Describing Relationships
  • Chapter 3.1: Scatterplots and Correlation
  • Chapter 3.2: Least-Squares Regression
  • Chapter 4: Designing Studies
  • Chapter 4.1: Sampling and Surveys
  • Chapter 4.2: Experiments
  • Chapter 4.3: Using Studies Wisely
  • Chapter 5: Probability: What Are The Chances
  • Chapter 5.1: Randomness, Probability, and Simulation
  • Chapter 5.2: Probability Rules
  • Chapter 5.3: Conditional Probability and Independence
  • Chapter 6: Random Variables
  • Chapter 6.1: Discrete and Continuous Random Variables
  • Chapter 6.2: Transforming and Combining Random Variables
  • Chapter 6.3: Binomial and Geometric Random Variables
  • Chapter 7: Sampling Distributions
  • Chapter 7.1: What Is a Sampling Distribution?
  • Chapter 7.2: Sample Proportions
  • Chapter 7.3: Sample Means
  • Chapter 8: Estimating With Confidence
  • Chapter 8.1: Confidence Intervals: The Basics
  • Chapter 8.2: Estimating a Population Proportion
  • Chapter 8.3: Estimating a Population Mean
  • Chapter 9: Testing A Claim
  • Chapter 9.1: Significance Tests: The Basics
  • Chapter 9.2: Tests about a Population Proportion
  • Chapter 9.3: Tests about a Population Mean
  • Chapter Introduction: Data Analysis: Making Sense of Data

Textbook: The Practice of Statistics
Edition: 5

Author: Daren S. Starnes, Josh Tabor
ISBN: 9781464108730

1 EXCEL MANUAL for Moore s The Basic Practice of Statistics Fifth Edition Betsy Greenberg University of Texas Austin with solutions by Linda Myers Harrisburg Area Community College W.H. Freeman and Company New York

2 Copyright 2010 by W.H. Freeman and Company No part of this book may be reproduced by any mechanical, photographic, or electronic process, or in the form of a phonographic recording, nor may it be stored in a retrieval system, transmitted, or otherwise copied for public or private use, without written permission from the publisher. ISBN-13: ISBN-10:

3 Contents Preface Chapter 0 Introduction 1 Chapter 1 Looking at Data: Exploring Distributions 22 Chapter 2 Looking at Data: Exploring Relationships 38 Chapter 3 Producing Data 49 Chapter 4 Probability 54 Chapter 5 Sampling Distributions 65 Chapter 6 Introduction to Inference 68 Chapter 7 Inference for Distribution s Chapter 8 Inference for Proportions 85 Chapter 9 Inference for Two-Way Tables 62 Chapter 10 Inference for Regression 98 Chapter 11 Multiple Regression 105 Chapter 12 One-Way Analysis of Variance 114 Chapter 13 Two-Way Analysis of Variance 118 Chapter 14 Bootstrap Methods and Permutation Tests 123 Chapter 15 Nonparametric Tests 130 Chapter 16 Logistic Regression 135 Chapter 17 Statistics for Quality 137 Chapter 18 Exercises Time Series Forecasting

4 PREFACE This Excel Manual is a supplement to Statistics textbooks by David S. Moore, et al. that are published by W.H. Freeman. This manual is intended to help student perform the analysis described in those textbooks. Excel is widely available as part of Microsoft Office. It contains some statistical functions in its basic installation. It also comes with statistical routines in the Data Analysis Toolpak, an add in found separately on the Microsoft Office CD. Excel is a useful teaching and learning tool, however it is not meant to replace more sophisticated statistical tools such as SPSS or SAS. People often use Excel as their everyday statistics software because they have already purchased it. This manual helps students understand the capabilities of Excel for statistical analysis. In addition to describing the standard features of Excel, this manual also illustrates the capabilities of the WHFStat Add In module. The WHFStat Add In module is available from W.H. Freeman. The module is programmed to include the following procedures and graphical analyses under the umbrella of a single menu. Descriptive statistics Probability calculations Discrete probability Distributions Estimating and Testing Means Proportion Testing Correlation and Regression Time Series Forecasting Two way table and Chi squared test Analysis of variance Graphs including normal quantile plots, boxplots, and control charts Betsy Greenberg University of Texas at Austin August 2008

5 CHAPTER 0 Introduction Microsoft Excel is a widely used spreadsheet application that millions of people use in their personal and professional lives to store, analyze, and present information. This manual describes how Microsoft Excel can be used effectively in your statistics course. Using Excel Microsoft Excel, commonly referred to as just Excel, is a spreadsheet program that organizes data in columns and rows, much like an accounting worksheet or table of data. Excel can also perform statistical analysis using built-in functions. The WHFStat Add-In software works within Excel to group all of the statistics functions into one menu. This software is described in section 0.8, and is available on StatsPortal, your Online Study Center, or packaged with this manual. 1

6 2 Chapter 0 Versions of Excel The examples in this book were written using Microsoft Excel The WHFStat Add In module operates with Excel 2003 or Excel 2007 under either the Windows Vista or Windows XP operating systems. It is also compatible with Excel 2004 for Macintosh. Versions of Excel prior to version 2003 cannot be used with this software. Prior Knowledge It is not necessary to have any prior knowledge of Excel to use this manual. However, it will be helpful to become familiar with Excel before using it for statistical analysis. Worksheet Basics When Excel is launched, a new file opens with a series of blank worksheets, also known as sheets. The file itself is called a workbook, which refers to the entire collection of spreadsheets, graphs, and user-developed programming code in the file. The figure below is a screenshot of a blank sheet in the Excel 2007 application. In the upper-right corner of the application window are three buttons that allow the user to minimize, maximize, or close the window. Notice that there are two sets of these buttons, one in the top grey portion of the window and one in the lighter blue area. This is because the Excel application is actually displaying one window within another. Clicking the middle of the three buttons (ignore the question mark button for now) in the light-blue area will make the two windows more prominent, as shown below.

7 Introduction 3 The outer window is the Excel application window, which contains all of the buttons and menus that control the functionality of the program. The inner window contains the workbook with all of its sheets. Looking more closely at this inner window reveals a number of controls that allow the user to navigate around the active worksheet or to display other sheets in the workbook.

8 4 Chapter 0 Sheet Tabs Each worksheet is labeled with a tab at the bottom of the workbook, and individual sheets are activated by clicking these tabs. More than one sheet can be activated by selecting the first sheet, holding the Control Key down, and selecting additional sheets as required. If there are too many sheet tabs to be displayed all at once, the tab scrolling buttons can be used to bring a particular tab into view. Double clicking or right-clicking the name on a tab allows the sheet to be renamed. Sheets can be rearranged by dragging and dropping a given sheet tab to a new location within the group of tabs as a whole. Clicking the Insert Worksheet button adds a new blank sheet to the workbook. The scroll bars allow the portion of the spreadsheet currently displayed to be moved left or right, up or down. Rows, Columns, and Cells Notice that the worksheet is divided into a series of columns labeled with letters at the top, and a series of rows labeled with numbers on the far left. At the intersection of any column and row is a discrete portion of the sheet called a cell. All numeric and text data for a worksheet is housed within these cells. An individual cell is identified by the row and column in which it resides. For example, the cell located at the intersection of column A and row 1 is identified as A1, which is also known as the cell s address. Selecting Cells and Ranges In order to enter data in a cell, the target cell must first be selected. A cell is selected by using the mouse to click on a specific cell s location or by typing the arrow keys until the proper cell is reached. When a cell is selected, it is surrounded by a heavy black outline and the row and column headings corresponding to that cell are highlighted, as shown for cell A1 in the picture above. The highlighted cell is known as the active cell, and any numbers or text revisions to the spreadsheet will always be added to this active cell. When a new workbook is created, cell A1 in Sheet1 is automatically selected as the active cell. Clicking and dragging across more than one cell selects all the cells across the entire region, known as a cell range. To select multiple ranges, select the first range, hold the Control key down, and select any other ranges of cells. Clicking the Select All button selects all cells in the current worksheet. The worksheet shown below illustrates four types of ranges: an individual cell, a partial row of cells, a partial column of cells, and a range crossing multiple columns and rows. A range must be a rectangular shape or a group of adjacent cells. The active cell is always the upper-left cell of the last selected range. Just like individual cells, ranges have addresses that describe the cells they contain. A range address is composed of the upper-left cell in the range, a colon, and the lower-right cell in the range. The pictured ranges have the following range addresses: B2, B5:B10, D2:H2, D4:G11. The current active cell as pictured is D4.

9 Introduction 5 Using Excel s Functionality The Ribbon Excel 2007 introduced a new interface for accessing Excel s various controls and functionality called the Ribbon. The ribbon provides a series of context specific commands, grouped together so that similar commands display at the same time. The various groups of commands are accessed by selecting a ribbon tab near the top of the ribbon. Each tab displays a series of related commands. Brief overviews of the commands available in the ribbon tabs are outlined in the table below: The most commonly used functions cut, copy, and paste; font formatting and alignment, number formats, cell background color and borders, inserting and deleting cells, sorting, filtering, and finding/replace functions Functions to insert tables, charts, artwork, graphics or specialized text

10 6 Chapter 0 Printing options, workbook themes and colors, margins, page breaks, and scaling Controls to assist the user in creating, editing, and auditing formulas and calculation options Sorting and filtering, data validation, outlining, connecting to data in external sources such as databases or the internet Spellcheck and proofing tools, protecting and sharing workbooks, adding comments, tracking changes Display the workbook in various ways, hide/display gridlines and headings, arrange and size windows The Office Button Commands to open a new or existing workbook, save changes, and print can be found by clicking the circular Office button in the upper-left corner of the application window. For users accustomed to Excel versions prior to 2007, these commands correspond to the old File menu, which is not part of the new Excel ribbon. The Quick Access Toolbar Excel 2007 also gives the user the opportunity to place some of the most commonly used commands in a special Quick Access toolbar that is always available, regardless of which ribbon tab is currently selected. It is located just right of the Office button in the upper-left corner of the application window. Saving a file, undoing or redoing a change, previewing or printing can easily be added to this menu by selecting options in the small drop-down arrow at the right end of the toolbar. Excel also provides the ability to add practically any built-in command to this toolbar, so if a particular command is frequently used, it can be added here, rather than constantly selecting it on the ribbon. The Formula Bar and Name Box Although cell contents can be edited directly in the active cell, it is generally easier to edit cell contents by using the formula bar, located below the ribbon and just above the worksheet window. This is particularly useful for long formulas or text. When cell contents are being edited, two buttons appear, which allow the user to cancel the changes being made (the Cancel button, marked with an ) or accept the changes as entered (the Enter button, marked with a check mark ). Always available is the Insert Function button, which easily allows the user to select one of many pre-existing Excel functions for use in the cell being edited (see Entering Data: Formulas below). Once a function has been selected using this tool, Excel displays a helpful interface to assist the user in building the formula correctly.

11 Introduction 7 Sometimes the data displayed in the Formula bar is too long to be displayed in a single line. The height of the formula bar can be adjusted to display multiple lines by dragging the bottom portion of the formula bar downward. Once adjusted, one can toggle between the single line and multiple line displays by clicking the double downward arrow button at the end of the Formula bar. To the left of the Formula bar is the Name box, which displays the address of the current active cell D3 in the picture above. This can also be used to quickly navigate to a specific cell by typing the cell address into the Name box and hitting Return. The requested cell is selected and the spreadsheet is scrolled to the appropriate location. When editing formulas, the Name box displays the most frequently used Excel functions, allowing the user to easily add them to the current formula by selecting from a drop-down menu.

12 8 Chapter 0 The Status Bar, Zoom Slider, and Window Size Control At the bottom of the Excel application window are several more useful features. The status bar, located at the far left, displays messages about the current status of the Excel application. Rightclicking the status bar allows the user to select a number of options for what is displayed, such as whether or not Caps Lock is turned on or quick sums, counts, and averages of the currently selected cells. Just to the right of the status bar are three buttons that allow the user to switch between Normal, Page Layout, and Page Break preview views. Just to the right is a slider that controls the current Zoom setting for the worksheet. This can be adjusted to make a greater or lesser portion of the spreadsheet be displayed in the current window. In the bottom-right corner of the window is the Window Size Control, which can be used to adjust the size of the application window. The Help Button Near the top-right corner of the application window is a blue circle containing a question mark. This Help button, activates Excel s Help system. An extensive amount of information comes pre-loaded with the Excel application. Excel also automatically searches for the most up-to-date information on Microsoft s Excel website. While this manual provides a quick summary of the most basic Excel functionality, the Help system will provide more detailed information on specific topics as you need them.

13 Introduction 9 Entering Data Three types of information can be entered into a cell: text, numeric values, and formulas. Text Text can be entered in any combination of letters, numbers, or special characters. By default, text is aligned to the left within the cell. This can be changed via the Alignment group of buttons on the Home tab of the Ribbon (Ribbon Home Alignment). Although an individual cell can contain 32,767 characters, generally large strings of text are broken up into smaller pieces and spread across multiple cells. If the text entered into a cell is longer than the width of the cell allows to be displayed, the display is truncated. The completed text is still housed within the cell, and can be viewed in the formula bar. See the appropriate section above for instructions on how to adjust the amount of lines displayed in the Formula bar. The font type, size, color, and other font formatting features are adjusted using the controls in the Ribbon Home Font group. Numeric Values Excel is used primarily to perform calculations, so typically many of the cells in a spreadsheet contain numbers. Excel can be instructed to interpret the number in a specific cell as a date or time, a fraction, an amount of currency, a percentage, a phone number, or just a regular number. This is controlled with the buttons in the Number group on the Home tab of the Ribbon (Ribbon Home Number). If you enter a number and it appears differently than expected, try changing the cell s number format settings. For example, when entering 1/4 into an unformatted cell, Excel

14 10 Chapter 0 displays this as 4-Jan. Excel has interpreted the entry as the short format for a date and displayed it in the default date format. Once the number format for a cell has been specified by the user, it retains that format until changed. By default, numbers are right aligned, but this can be changed with the Alignment controls as described above for Text entries. A few rules to keep in mind when entering numeric values: No spaces allowed, The first character of a number must be 0 through 9, +,, or $. The number can include commas, decimal points (using the period key) or forward slashes (such as with dates or fractions). Negative numbers are designated with a preceding negative sign (-) or by surrounding the number with parentheses. Numeric values that do not follow the guidelines listed above, or that contain letters or other characters are interpreted as text. To force a number to be interpreted as text, precede the number with an apostrophe (single quote). If a cell is too narrow to display the entire number it contains, Excel instead displays a series of # signs. To display the number correctly, adjust the column width as described in the Formatting a Worksheet Section below. Formulas Formulas are mathematical expressions that can use values or formulas in other cells to calculate new values. Formulas can include numbers, cell addresses, multi-cell ranges, functions, and text. Upon entering a formula into a cell, the result of the formula is displayed in the cell itself and the equation is displayed in the Formula bar. To create a formula, make sure the first character within the cell is an equal sign (=). This alerts Excel that the following data entered in the cell should be interpreted as a formula. Excel uses the following symbols for these most common mathematical operations: the plus sign (+) for addition the minus sign (-) for subtraction the asterisk (*) for multiplication the forward slash (/) for division the caret symbol (^) for exponentiation the open and close parentheses ( ) for grouping parts of the formula

15 Introduction 11 Example formula =A3+ (C5) ^2 If a formula refers to a cell address for a value, and the value in that cell is changed, the formula is automatically updated and the new value displayed. This allows the user to continually update values throughout the spreadsheet and immediately see the resulting changes in the formulas as those value changes are made. Functions Functions can be used for arithmetic, statistical, scientific, logical or financial calculations, or even to manipulate text and find values within the spreadsheet. The most common functions are SUM, COUNT, AVERAGE, MAX, and MIN, but there are hundreds of functions available for your use. The general format for a function is an equal sign (just as with any formula), the capitalized name of the function itself, an open parenthesis, one or more arguments, and a close parenthesis. Arguments are the specific pieces of data required by that function to do the calculation. For example, a formula using the AVERAGE function would typically be of the form =AVERAGE (A2:A25). We have supplied the cell range A2:A25 for the argument. The cell range can either be typed into the formula, or it can be entered by dragging the mouse across the appropriate cell range when that portion of the formula is reached. The function name itself is not case sensitive and will be capitalized automatically when entry of the formula has been completed. While a function can be typed directly into a cell, it is much easier to use the built-in Insert function button, located in the Formula bar. Clicking this displays the following interface, which guides the user through searching for an appropriate function and entering the data for any required arguments.

16 12 Chapter 0 Modifying Data Editing Once a cell s content has begun to be entered, the backspace or delete keys can be used to modify the contents. To discard the changes completely, type the ESC key or click on the Cancel button in the Formula Bar. If the cell s content has been entered previously, it can be revised by double clicking the cell and moving the cursor to the appropriate location within the contents of the cell. Cell contents can also be edited by clicking the cell and changing the contents displayed in the Formula bar. Deleting or Clearing Data To delete cell contents, select the range of cells to be deleted and type the Delete key. This does not remove the actual cell from the spreadsheet, just its contents. Any cell formatting will remain. To have the option to remove cell contents, formatting, comments, or all three at once, select the range of cells to be cleared and click the Clear button within the Editing group on the Home tab of the Ribbon (Ribbon Home Editing Clear).

17 Introduction 13 The following options will be displayed: Clears formats, contents, and comments as described below Clears any background or border coloring, specific font styles or number formats, conditional formatting, cell alignment, etc. Clears data entered in the cell, similar to typing the Delete key Removes any comments attached to the cell Inserting and Deleting Rows and Columns Sometimes after data has been entered into a series of rows, it becomes necessary to insert new data between two of the existing rows. To insert a row, click the numbered row heading of the row beneath where you want to add the row and click the Insert button within the Cells group on the Home tab (Ribbon Home Cells Insert). A row will be inserted and any data previously in the selected row or below is shifted down. To insert more than one row, click and drag on more than one row heading and click the Insert button. New rows are added and old rows are shifted as appropriate. Inserting columns functions in much the same way, except one clicks on the desired number of column lettered headings immediately to the right of where the new columns should be inserted. Clicking the same Insert button executes the action. To delete rows or columns, select the specific rows or columns to be deleted and click the Delete button (Ribbon Home Cells Delete), which is right next to the Insert button. As rows or columns are deleted, all rows beneath or all columns to the right of the deleted section are shifted to fill the gap.

18 14 Chapter 0 Moving, Copying, and Filling Information Once cells contain content, that content can easily be moved or copied to another location within the same sheet, to another sheet, to a sheet in a different workbook, or even to another application. Cut and Paste Cell Content You may be familiar with the practice of cutting and pasting data in other applications, and Excel provides this functionality as well. Select the range of cells that contain the information to be moved and click the Cut button (Ribbon Home Clipboard Cut). Alternately, after selecting the target range of cells, right-click and select Cut from the pop-up menu or type Ctrl+X. All three methods of cutting place the entire contents of the selected cell range in Excel s memory (referred to in all Microsoft Office products as the Clipboard). At this point, the data has not yet been moved from the cells, but the selected cut range is indicated with a flashing dotted line surrounding it. Next, select the upper-left cell of the new area where you want the data you have just cut to be pasted. Click the Paste button (Ribbon Home Clipboard Paste) and the data from the old cells is placed within the new ones.

19 Introduction 15 Copy and Paste Cell Content To copy a target range to another location, with the old data remaining where it was, use the same basic method as described above, but select the Copy button or menu option instead of Cut. Drag and Drop Cell Content A target range of cells can also be dragged and dropped to another location on the same sheet. To do so, select the cell range to be moved. Notice that when the mouse pointer is placed directly over the heavy black line surrounding the selected range that the pointer changes to a small cross with four arrows. When the four arrow cross is displayed, click and hold the mouse, dragging the mouse to another location on the spreadsheet. The entire selected range of cells moves along with it, including all content and formatting.

20 16 Chapter 0 AutoFill Cell Content Excel also provides a simple way to populate data or formulas across a range of cells, or to create an incremental data series. To simply copy data or a formula across a range, select the cell to be copied. Notice that there is a small square (called a fill handle ) in the bottom-right corner of the heavy line surrounding the selected cell (circled in the image to the left). When the mouse pointer is placed over this handle, the arrow pointer becomes a crosshair. As that crosshair is displayed, click on the fill handle and drag the mouse down or to the right across the cells to be filled. Upon releasing the mouse, the data or formula in the original cell is copied across the range. Any formatting in the original cell is copied as well. To create an incremental series, type the first two numbers in the series in adjacent cells. Following the same procedure as described for copying above, select the two cells containing the first two data points in the series and drag the fill handle across the appropriate number of cells for the whole series. Based upon the first two numbers entered, Excel AutoFills the remainder of the series. Excel can also AutoFill the names of months. Simply enter January or Jan, and using the AutoFill method described above, Excel fills in the remainder of the months in the format entered. After December, Excel continues on with January again, filling in each successive month over the entire dragged range.

21 Introduction 17 Formatting a Worksheet Excel provides a wealth of tools to customize the look and feel of spreadsheets. First, select a cell or a range of cells to be formatted. Using the buttons on the Home tab in the Font, Alignment, Number, Styles, and Cells groups, the background color, border colors, row height, column width, fonts styles, and size and number formats can all be changed. Individual cells can be merged using the Merge and Center options in the Alignment group. Preprogrammed formats can be applied using the Cell Styles options in the Styles group. Adjusting Column Width and Row Height The height and width of an individual row or column can be changed by clicking and dragging the line between the rows or column in their respective headers. When pointing the mouse directly at the line between row headings or column headings, the pointer arrow changes to a line with arrows pointing in two directions (see images below). With this double-arrowed pointer displaying, click and hold the mouse. Light gray lines show the current boundaries of the row or column being adjusted. Dragging the mouse widens or narrows these boundaries to display the proposed width or height. If multiple rows or columns are selected at one time, the height or width is adjusted for the entire selection. Alternately, you can double click on the line between row or column headings for a best fit option for the selection. Row or column headings can also be right-clicked to display a menu that includes a Row Height or Column Width option. Formatting Cells Many cell, font, and number formatting options are available directly from the Home tab of the Ribbon, the Format Cells feature provides additional formatting options. It can be accessed by clicking the arrow-within-square button located in the bottom right of the Font, Alignment, and Number groups on the Home tab (see the circles in the image below).

22 18 Chapter 0 Adjusting the settings in the format cells will change the format in the selected cells. Tabs at the top provide the following formatting controls: Number Alignment Font Border Fill Protection Select number format styles of general, currency, date, time, percentage, etc. Also controls the number of decimal places displayed, whether or not commas or currency symbols are displayed, and how negative values are differentiated. Horizontal and vertical text alignment, direction of text, such as at a 45ยบ angle, whether to wrap text, indent, etc. Select font family and size, options for bold, italic, underline, strikethrough, superscript, subscript, and font color. Turn on or off borders at each of the four sides of a cell or the two diagonals. Weight, style, and color of each border segment can be adjusted independently. Control the color and pattern of cell interior backgrounds. Control whether users can edit the contents of cells or view cells formulas in the Formula bar. As with all formatting options, this can be controlled on a cell-bycell basis.

23 Introduction 19 Printing Page Setup Options Excel provides a number of tools to configure how the spreadsheet will look when it is printed. The primary printing options are located on the Page Layout tab of the Ribbon in the Page Setup group (Ribbon Page Layout Page Setup). Options include controls to adjust the page margins, page orientation, the expected size of the paper being used for printing, background images, where page breaks occur, which cells in the spreadsheet will be printed, and whether or not to repeat certain rows at the top or certain columns at the left of each page. Clicking the arrow-within-square button in the bottom right of the Page Setup group opens up a more detailed Page Setup interface with a greater level of control for these options as well as the ability to specify page headers and footers.

24 20 Chapter 0 Print, Quick Print, and Print Preview To access printing options, click the circular Office button in the top left of the Excel application window. From the Office menu, select the Print sub-menu, as displayed on the left. Selecting the Print command will display an interface that allows the user to select a printer and printing options. The Quick Print command will print the current spreadsheet using the default printer and default print options. The Print Preview command allows the user to see how the spreadsheet will appear before printing it. Using Excel s Statistical Tools Excel contains a set of pre-built statistical analysis tools as part of the Analysis Toolpak add-in included with the Excel software. For some Excel installations, it will need to be turned on. Doing so requires the following steps: 1. Click the circular Office button in the upperleft corner of the Excel application window. 2. In the light-blue border at the bottom of the Office menu, click the Excel Options button. 3. In the pop-up interface that displays, select Add-ins from the navigation bar on the left. 4. A list of available add-ins will be displayed. It may take a few moments for Excel to collect this information. At the bottom of the list, there should be a drop-down menu with add-ins selected. Click the Go button next to this. 5. Make sure that Analysis ToolPak is checked. Note: If the Analysis ToolPak is not listed, it will need to be added from the Excel installation software. Once the Analysis ToolPak add-in is installed, there should be a new analysis group available on the Data tab of the Ribbon (Ribbon Data Analysis). Click the Data Analysis button, and the following list of available analysis tools will be displayed.

25 Introduction 21 Many of the included Excel statistical analysis tools are detailed where appropriate in the exercises in the following chapters. Where appropriate, exercises taken from the textbook are solved using both the Excel analysis tools and the WHFStat add-in module packaged with this manual. The Excel solutions are identified by this icon The WHFStat solutions are identified by this icon Using the WHFStat Add-In Module WHFStat is an Excel Add-in, software that makes it easier to use Excel to do most statistical operations. The software is available on StatsPortal, your Online Study Center or packaged with this manual. Once installed, the WHFStat Add-In module will be integrated into your Excel application and will automatically load every time you open Excel. You will notice a new Add-Ins tab on the Ribbon, upon which the Menu Commands group will have a button labeled WHFStat. Clicking this will display the various menu options available for the add-in.

26 CHAPTER 1 Looking at Data: Exploring Distributions Bar Charts Excel allows us to examine the distribution of variables with graphs. Bar charts are useful for categorical data. The following data provides the tire model reported for 2969 accidents that involved Firestone tires. 22

27 Looking at Data: Exploring Distributions 23 We will use this data to make a Bar Chart with Excel. Highlight the data and select Insert Column 2 D Column as shown below. Clicking on the 2 D Column will produce the following bar chart.

28 24 Chapter 1 Pie Charts Another way to examine distributions of categorical variables is with a pie chart. We will continue to use the data from the previous example to show how to make a pie chart with Excel. To make a pie chart of the waste data, select Insert Pie 2 D Pie from the menu.

29 Looking at Data: Exploring Distributions 25 Although we highlighted three columns, Excel used only the first two when constructing the pie chart. The values in the column labeled percent can be used by changing the data. If you right click on the pie chart and choose Select data, the Select Data Source dialog box pops up. In that dialog box, you can delete the Count series so that the Percent series will be used instead. Alternatively, you can click on the graph and select the Design tab from the menu to select an alternative presentation such as the one shown below. Histograms The most common graph for the distribution of a quantitative variable is a histogram. We will illustrate this with IQ test scores for 60 fifth grade students. To create a histogram, select Data Data Analysis Histogram

30 26 Chapter 1 from the menu. In the dialog box, specify the input range to be where the data is located as shown below. If the first cell is a label, check the Labels box. The Output Range specifies where the output will be placed. Finally, check the Chart Output box and then click OK. The default histogram appears as follows. You can also specify alternative bin ranges to avoid the default values selected by Excel. It is helpful to first select Data Sort from the menu to sort the data before deciding on the Bin Ranges. The new values are then typed into a column on the Excel worksheet. If the data has a label and the Label box will be checked, then this new column should also have a label. The new column is then entered into the Histogram dialog box next to Bin Range.

31 Looking at Data: Exploring Distributions 27 The new histogram will use the selected bin ranges, but not be entirely satisfactory. For example, the bin ranges in the histogram below appear to be interval midpoints instead of cutpoints. It appears from this histogram as though there are no observations below 85, when in fact there are. The gap width between the bars can be changed or eliminated by right clicking on a bar and selecting Format Data Series and changing the option in the following dialog box.

32 28 Chapter 1 Time Series Plots When quantitative data are collected over time, it is a good idea to plot the observations in the order they were collected. For example, the following data lists the volume of water discharged by the Mississippi River in the Gulf of Mexico for each year from 1954 to Year Discharger Year Discharge Year Discharger Year Discharge To make a time series plot of this data, highlight the data and select Insert Scatter and select a graph design from the menu.

33 Looking at Data: Exploring Distributions 29 The time series plot will appear as soon as you click on the plot design of your choice. If Scatter with Straight Lines is selected, the plot will appear as follows. As usual, the plot can be altered by clicking on the graph and selecting the Design tab. Since the dates appeared in one column with the data in the next column to the right, the time plot has the dates on the x axis and the data on the y axis. If the data appears in a different or

34 30 Chapter 1 der, you can right click on a data point and choose Select Data. The dialog box that is shown below allows you to switch columns, and add or delete a series so that you can use the appropriate data. If your data is not accompanied by a column of dates, highlight only that data and select Insert Line from the menu. Select the design that you prefer to obtain a time plot. In this case, the x axis will be labeled with consecutive numbers instead of dates. Numerical measures are often used to describe distributions. Select Data Data Analysis Descriptive Statistics from the menu to obtain descriptive statistics. Enter the input range for the data. If the data includes a label in the first row, check the appropriate box. Specify where the output will appear, check the box next to Summary Statistics, and click OK in the following dialog box.

35 Looking at Data: Exploring Distributions 31 The command summarizes several different measures of both the center and spread of a distribution. The command prints the statistics Mean, Standard Error, Median, Mode, Standard Deviation, Sample Variance, Kurtosis, Skewness, Range, Minimum, Maximum, Sum, and Count, for each column specified. Count is the number of actual values in the column (missing values are not counted). Mean is the average of the values. To find the median, the data first must be ordered. If N is odd, the median is the value in the middle. If N is even, the median is the average of the two middle values. StDev is the standard deviation computed as ( ) StDev = x i x N 1 Standard Error is the standard error of the mean. It is calculated asstdev N. The same results can be obtained using functions in Excel. For example, typing =AVERAGE(A2:A61) into a cell gives the mean, =STDEV(A2:A61) gives the standard deviation, and =COUNT(A2:A61) gives the sample size. In addition, we can obtain the quartiles needed for the five number summary using the QUARTILE function. If you click on the Insert Function,, you can search for the appropriate function as shown below. 2

36 32 Chapter 1 As shown below, you can obtain the function either by typing the formula int0 an empty cell, clicking on an empty cell and then typing the formula into the formula bar to the right of the Insert Function button, or by filling in a dialog box. Either way, you must specify the Array that holds the data and then the Quart, where Quart = 0 is the minimum value, 1 is the first quartile, 2 is the median, 3 is the third quartile, and 4 is the maximum. Excel doesn t use exactly the same algorithm to calculate quartiles as your textbook, so minor differences in results will sometimes occur. The five number summary consisting of the median, quartiles, and minimum and maximum values provides a quick overall description of a distribution.

37 Looking at Data: Exploring Distributions 33 Normal Calculations Sometimes the Normal density can describe the overall pattern of a distribution. A histogram may be helpful in deciding when this is appropriate. Normal quantile plots are also useful in determining whether a distribution is approximately Normal. If the points on a Normal quantile plot lie close to a straight line, the plot indicates that the data are Normal. Excel can be used to perform Normal distribution calculations. If data in a column are Normally distributed, then the data can be standardized to obtain data with a standard Normal distribution, that is, those with mean equal to zero and standard deviation equal to one. The STANDARDIZE(x,mean,standard_dev) function can be used to do this. The function requires that you specify the x value that you want to standardize, the mean of the distribution and the standard deviation of the distribution. The function returns the standardized value, z = ( x x) s. For example, if the IQ scores are normally distributed with a mean equal to 100 and a standard deviation equal to 10, then we can standardize as shown below. If we copy the formula down the column, we can obtain the standardized values for all of the vocabulary scores. The standardized IQ scores (or z score) will tell how far above or below the mean a particular score falls. The measure is in units of standard deviations. The first student has a score of 81, a value that is below the mean (z = 1.9). Another score, 117 is above the mean (z = 1.7). We could examine the standardized values to see how well they obey the rule. Approximately 68% of the standardized values should have values between 1 and +1, 95% should have values between 2 and +2, and 99.7% should have values between 3 and +3. You can use Excel to do probability calculations for the Normal distribution using the NORMDIST(x,mean,standard_dev,cumulative) function. When cumulative=1, this function returns the normal distribution for the specified mean and standard deviation. In addition to the 1 for cumulative, you must specify the x value for the distribution along with the mean and standard deviation. For example, the heights are approximately Normal with a mean of about 64 inches and a standard deviation of 2.7 inches. To find the proportion of women who are less than 70 inches tall, we select type =NORMDIST(70,64,2.7,1) into a cell or the formula bar. Alternatively, click Function Wizard on the formula bar to select the NORMDIST function and fill in the dialog box.

38 34 Chapter 1 The result says that the proportion of women who are less than 70 inches tall is , or nearly 99%. This is slightly different from the result that would be obtained using Table A since it is not required to round the standardized value. We can also use Excel to do backward calculations. The length of human pregnancies in days from conception to birth follow approximately the N(266,16) distribution. To find the length of the longest 10% of pregnancies, we can use the NORMINV function. The function requires that we specify the appropriate probability along with the mean and standard deviation of the distribution. Since we want the value for the top 10%, the input constant is 0.9 corresponding to 90% below the calculated value. As for all functions, we click on the cell where you want the results and then type in that cell or on the formula bar as shown below, or click on the function wizard. The function returns the inverse of the normal cumulative distribution for the specified mean and standard deviation. In this example the value returned is 286.5, indicating the the longest 10% of human pregnancies last at least days.

39 CHAPTER 2 Looking at Data: Exploring Relationships Scatterplots Often we are interested in illustrating the relationships between two variables, such as the relationship between height and weight, between smoking and lung cancer, or between advertising expenditures and sales. For illustration, we will consider the relationship between the number of items sold and gross sales at Duck Worth Wearing, a shop selling high quality, second hand children s clothing, toys, and furniture. If both variables are quantitative, the most useful display of their relationship is the scatterplot. Scatterplots can be produced by highlighting the variables in the scatterplot and selecting Insert Scatter Scatter with only Markers from the menu. The explanatory variable should be plotted on the x axis and the response variable should be plotted on the y axis. The highlighted columns should have the x variable on the left and the y variable on the right, so you may need to rearrange your data. If so, you 35

40 36 Chapter 2 can highlight the column with the y variable and select Home Insert Insert Sheet Columns to add space for a column to the left of the y variable. Copy the x variable into the empty space. Once the data are correctly arranged and you click on the Scatter with only Markers button, your scatterplot will appear. Initially, your scatterplot may not look the way you want it to. If you are clicked on the chart, Chart Tools will also appear at the top of the Excel menu. These tools allow you to modify the chart. Choose layout and modify as desired. For example, select Layout Axis Titles Primary Horizontal Axis Title Title Below Axis to add an x axis title. Click the axis title and type the text that you want. The data in the scatterplot are positively associated, in a roughly linear pattern with no clear outliers.

41 Looking at Data: Exploring Relationships 37 We can add information about a third categorical variable to a scatterplot by using different symbols for different points. The Duck Worth Wearing store is open Monday through Saturday. The five Saturdays in April 2000 (04/01, 04/08, 04/15, 04/22 and 04/29) are the days with the highest numbers of items sold. We can improve the scatterplot by plotting the Saturdays with a different plot symbol. First, we add a categorical variable Saturday to the Excel spreadsheet. This variable has only two values: 1 for the Saturdays and 0 for the weekdays. The Saturday data is easily separated from the weekdays by sorting as shown. A labeled scatterplot can then be obtained by selecting Design Select Data Add to add a series with only the Saturday data..

42 38 Chapter 2 Specify the x and y values and a series name by clicking on the small spreadsheet icons. The additional series will appear in the graph. You will probably want to add a legend to the scatterplot by clicking on the chart and selecting Layout Legend. The scatterplot with Saturdays identified shows that the company is busier on Saturdays. Correlation We can compute the correlation coefficient between two quantitative variables using Excel. The correlation coefficient can be calculated by selecting

43 Looking at Data: Exploring Relationships 39 from the menu. Data Data Analysis Basic Statistics Correlation Below we illustrate a correlation calculation with bird colony data. The data gives, for 13 colonies of sparrow haws, the percent of adult birds in a colony that return from the previous year and the number of new adults that join the colony. To calculate the correlation, the input range should be the variable for which you wish you are needing the correlation. If there are labels in the first row, check the appropriate box. Select a location for the output and click on the OK button. The correlation of the two variables is shown in the table below to be If more than two variables are selected in the Input Range, Excel will include the correlation coefficients between all pairs of variables.

44 40 Chapter 2 Alternatively, the correlation of two variables can be calculated using the CORREL function. Type =CORREL(data range) into any cell or click on the Insert Function button and type CORREL to obtain the dialog box shown below. Least Squares Regression The scatterplot for Duck Worth Wearing shows that there is a strong linear relationship between the number of items sold and the gross sales. To calculate the least squares line of the form y = a + bx from data, right click on a point on the scatterplot, select Add Trendline from the list, select Linear on the Trendline Options, check the box next to Display Equation on the chart, and Display R squared value on the chart if desired.

45 Looking at Data: Exploring Relationships 41 The scatterplot now shows the least sqaures line, the equation for the line (y = 6.595x ) and that r 2 = The slope and intercept can also be found using Excel s SLOPE and INTERCEPT functions. These functions can be typed into a cell or you can click on the Insert Function button. Both SLOPE and INTERCEPT require that you specify the known values of y and x. To find the residual for each point, first calculate the fitted value for each point, then calculate the value of the residual. For each point, the fitted value, = a + bx and the residual is y. Once the residuals have been calculated, the residual plot is just a scatterplot of the residual versus the x variable.

46 42 Chapter 2 Tables for Categorical Variables We can describe relationships between two or more categorical variables using two or threeway tables in Excel. We will use the data on binge drinking by college students. In this data set, we have stored information on 17,096 students classified by gender and whether or not they are frequent binge drinkers. To make a two way table in Excel select Insert Pivot Table from the menu. In the dialog box, select the range of input data and the location where you want the Pivot Table report to be placed as shown below. Click OK and the blank Pivot Table will appear. The Pivot Table Field List will also appear as long as a cell within the Pivot Table is selected.

47 Looking at Data: Exploring Relationships 43 If we view gender as the explanatory variable and frequent binge drinking as the response variable, then we put gender in the columns and frequent binge drinking in the rows. This is easily done by dragging the word Gender into the Column Labels field and the word Drinker into the Row Labels field. Once either Gender or Drinker is dragged into the Values field, the data will appear in the Pivot Table as shown below. For three way tables, an additional variable would be included and dragged into the Report Filter field.

48 44 Chapter 2 Once the two way table has been constructed, marginal and conditional probabilities can be constructed by typing the appropriate formulas into cells. For example, to calculate the proportion of men that are frequent binge drinkers, the formula would be =D7/D8.

49 CHAPTER 3 Producing Data Random Samples Excel allows us to select a simple random sample from a population. To choose a random sample, select Data Data Analysis Sampling from the menu. Specify the input range from which you are sampling, click on the radio button for Random, specify the number of samples, and the output range. In the example below, we wish to select a sample of five randomly selected small business clients for a customer satisfaction survey. The input range for Sampling must be numeric. If you have a list of names instead of numbers, you must create a corresponding list of numbers. To enter a list of numbers into Excel, enter the first few numbers to establish the pattern. Highlight these numbers and then use the fill handle to automatically fill data in worksheet cells. The sample you select may have repeated numbers. You can select a new sample if this is not what you want or you can select a sample larger than needed so that you can skip any repeats. 45