Use of Computerized Systems in Healthcare Epidemiology
Keith F. Woeltje
INTRODUCTION
Computers have become ubiquitous in modern society, and computer power has increased steadily. It takes about the same amount of computing to answer one Google Search query as all the computing done—in flight and on the ground—for the entire Apollo space program (1). The “Meaningful Use” program mandated by the U.S. government has spurred much more rapid adoption of electronic medical records in hospitals, with a concomitant increase in data that are, at least potentially, available electronically. While the use of computers has become ubiquitous in hospitals, the degree of use continues to vary widely.
BASIC COMPUTER SUPPORT
PERSONAL COMPUTERS
Personal computers (PCs) started as machines built by geeky enthusiasts in the 1970s, but became widely commercially available in the 1980s. In the first years of the 21st century, computers have become commodity items. PC prices have fallen steadily while computing power has increased exponentially. Until a few years ago, considerable thought had to be given to buying a PC in balancing price, capability, and upgradeability. Now, even the most inexpensive PCs and laptops available at large electronics stores or online can readily handle basic computer needs.
A computer and its operating system alone would not be able to accomplish very much. Additional software is needed to perform basic tasks, such as word processing. Although standalone programs of productivity software are available, they are typically obtained in suites with all components somewhat integrated, allowing consistency of use across programs. Microsoft Office has become the most commonly used office suite, but a variety of competitors exist, including the free, open-source, LibreOffice (www.libreoffice.org). Most institutions prefer to have all employees use the same program for ease of exchanging information, but most of these programs can convert between the most common formats. Online “cloud” versions of these applications are also available—they are discussed below.
Spreadsheet software (e.g., Microsoft Excel, LibreOffice Calc) is designed primarily to manipulate numbers, not text. Spreadsheets are extremely useful to healthcare epidemiologists for calculating rates and doing basic statistics. The column and row structure of spreadsheets also allows them to be used as simple databases, such as a line-listing for an outbreak investigation. Spreadsheet software can also make graphs for visualization of data, such as run charts of infection rates over time. These graphs then can be imported into a word processing document or slide presentation for distribution to others. Such graphs are easy to make, and by helping staff visualize the data, make for far more compelling discussions.
Desktop relational database software (e.g., Microsoft Access, LibreOffice Base) is not included in many basic editions of office suite software, but may be available in more extended or “professional” editions. Although in some respects more difficult to use than other desktop computer productivity software, database software can offer healthcare epidemiologists significant advantages over other methods of storing data, such as using spreadsheet software. For example, if blood culture data are being stored as part of a study, if a given blood culture has >1 organism, then in a spreadsheet, either a number of columns need to be included (e.g., “organism_l,” “organisms_2”) if data from a given culture are all to be included on one row, or a number of separate rows need to be used to record the information from a single culture. Both alternatives may introduce difficulties in subsequent data retrieval and analysis. A relational database, however, would allow the information to be structured in a manner that avoids these issues.
Because most office productivity software can be used at a basic level without any special training, the benefits of formal training are frequently under-appreciated. Various levels of training classes may be offered by larger organizations or may be available at local community colleges. Online training or structured textbooks provide additional alternatives. The time and expense spent on training will be returned many times in productivity gains. In particular, database software requires some training to use it appropriately. An excellent introduction to databases at a conceptual level (not tied to any particular program) is Database Design for Mere Mortals (2).
NETWORKED COMPUTERS
Although a stand-alone PC can bring a huge productivity boost to a healthcare epidemiology department, the potential utility of the computer increases significantly when it is networked to other computers. The PC then is no longer limited to data that have been entered by hand or files brought to it on some form of solid media. Instead, the PC can now share information with other computers at high speeds. Essentially, all hospitals now have networks that connect the computers in their various departments together. These networks, in turn, are typically
connected to the Internet, providing an electronic connection between the hospital and the rest of the world.
connected to the Internet, providing an electronic connection between the hospital and the rest of the world.
INTERNET AND THE WORLD WIDE WEB
The Internet is really a network of computer networks (3), Precursors to the Internet began as a research project in the late 1960s sponsored by the U.S. Department of Defense’s Advanced Research Project Agency (DARPA). The Internet as we know it started on January 1, 1983 (4), and has grown exponentially in the intervening decades.
For many people, the World Wide Web (“the Web”) is synonymous with the Internet. The World Wide Web started in 1990 as a research project at the European Organization for Nuclear Research (CERN) (5). The purpose of the Web is to provide access to online documents (Web pages), including an easy mechanism for one page to refer to another. These pages can be viewed with software called Web browsers. Since its inception, the Web has grown beyond simple text to include images and multimedia presentations and allow downloads of files of various types.
For healthcare epidemiologists, the Web provides a wealth of resources. Many professional organizations have Web sites that provide news, guidelines, and links to other resources for both members and nonmembers. Examples include the Society for Healthcare Epidemiology of America (SHEA, www.shea-online.org), the Association of Professionals in Infection Control and Epidemiology, Inc. (APIC, www.apic.org), the Community and Hospital Infection Control Association—Canada (CHICA—Canada, www.chica.org), the Hospital Infection Society (HIS, www.his.org.uk), and the International Federation of Infection Control (IFIC, www.theific.org). Government organizations such as the Centers for Disease Control and Prevention (CDC, www.cdc.gov) provide a wealth of resources including guidelines, information on specific diseases and outbreaks, and reference materials. State and local health departments may also have Web sites that provide valuable information on local issues. The Web also provides easy access to information from companies regarding their products. Literature searches of the U.S. National Library of Medicine’s MEDLINE database can also be conducted on the Web using the PubMed system (www.pubmed.gov).
One of the most exciting developments of the Web is the explosion of educational opportunities. Many universities offer online classes, and it is possible to earn an advanced degree from a respectable institution through web-based learning. There is also an abundance of noncredit, but free educational sites. One example is the Supercourse, a Web-based set of >5,000 lectures in epidemiology (available at www.pitt.edu/˜super1/) hosted by the University of Pittsburgh. Other universities are making their classroom materials available online for noncredit use, for example Coursera (coursera.org) sponsored by Stanford, and edX (www.edX.org) sponsored by Massachusetts Institute of technology (MIT) and Harvard. Besides universities, other start-up companies offer similar classes, bringing in some of the best university professors from around the world to do individual courses (e.g., Udacity [www.udacity.com]).
Some hospitals and other organizations block or severely limit access of employees to the Web. Healthcare epidemiology and infection prevention programs can make a strong argument for having relatively unfettered Web access to do their jobs correctly. This typically requires going through a formal request process.
E-MAIL
Electronic mail, or e-mail, is probably even more popular than the World Wide Web in terms of total number of users. Sending files as attachments in an e-mail has become a preferred method of sending data from one person to another. In addition, e-mail can serve as a means to alert healthcare epidemiologists to significant issues in a timely fashion. Organizations (e.g., SHEA and APIC) send e-mail alerts to their members when warranted. Healthcare professionals also can sign up for alerts and updates on terrorism and emergency response from the CDC at emergency.cdc.gov/coca/subscribe.asp. In addition, the CDC’s Division of Healthcare Quality Promotion has a Rapid Notification System for Healthcare Professionals that sends out e-mail alerts related to outbreaks and product recalls. The sign-up page is at www2a.cdc.gov/ncidod/hip/rns/hip_rns_subscribe.html.
E-mail “list-serv” software allows e-mail to be sent to many persons by sending an e-mail to a particular e-mail address. This facilitates group discussions via e-mail. Popular e-mail groups for healthcare epidemiologists include ProMED-mail (sign up at www.promedmail.org), which is sponsored by the International Infectious Diseases Society, and the Emerging Infections Network (request sign-up information from ein@ uiowa.edu), which is sponsored by the Infectious Diseases Society of America (IDSA) and the CDC. Because of the rise of unsolicited commercial e-mails (termed “spam”), many institutions have “spam filters” in place to reduce the influx of these nuisance messages. Unfortunately, these filters may filter out legitimate e-mail. List-serve-mails, in particular, may be filtered out as “bulk e-mail.” Local information systems personnel should be consulted to determine what steps are needed to ensure that desired e-mail reaches the recipient.
ONLINE SOFTWARE SERVICES
An increasing variety of software is available to use online. This includes general productivity software (e.g., Google Docs, Microsoft Office 365), e-mail (e.g., GMail, Yahoo! Mail), and others. These often are termed cloud applications because they live in the “data cloud” of the Internet. Advantages include access from any computer with Internet access and reliable backups. A major disadvantage is the requirement for Internet access (although this is often overcome by local software that can sync-up with the cloud service once Internet access is again available). A more significant concern in the healthcare setting is data security. The same ease of access that makes these services convenient makes them potentially insecure. Hospitals and universities typically have strong guidelines regarding data security. This is, in part, to ensure compliance with Federal regulations, such as the Healthcare Insurance Portability and Privacy Act (HIPAA) privacy and security rules. Guidance should be sought before using online services for any data that might be sensitive.
GENERAL STATISTICAL SOFTWARE
For more extensive analysis than can be done with a spreadsheet, a healthcare epidemiologist could use general-purpose statistical software. A wide range of programs exists, from basic statistical packages included with some statistics texts to expensive, very complex programs that can perform even
the most esoteric analysis. Widely used statistics packages include SAS, SPSS, and Stata. Many other very capable commercial programs are available. R (www.r-project.org) is an open-source general-purpose statistical package. EpiTools (available from cran.r-project.org/web/packages/epitools/index.html) is a set of tools for use with the R statistical package designed to add functions of use to epidemiologists. Although some of these programs allow for some form of direct data entry, they really are designed to import data that was entered using another program (e.g., a database or spreadsheet). Thus, using these programs for routine healthcare epidemiology use takes some effort. A user who is facile with a given program may choose to do basic analyses in that program. For the most part, these statistical programs are overkill for most healthcare epidemiology programs and are used primarily in research settings or by facilities that have dedicated statistical analysts.
the most esoteric analysis. Widely used statistics packages include SAS, SPSS, and Stata. Many other very capable commercial programs are available. R (www.r-project.org) is an open-source general-purpose statistical package. EpiTools (available from cran.r-project.org/web/packages/epitools/index.html) is a set of tools for use with the R statistical package designed to add functions of use to epidemiologists. Although some of these programs allow for some form of direct data entry, they really are designed to import data that was entered using another program (e.g., a database or spreadsheet). Thus, using these programs for routine healthcare epidemiology use takes some effort. A user who is facile with a given program may choose to do basic analyses in that program. For the most part, these statistical programs are overkill for most healthcare epidemiology programs and are used primarily in research settings or by facilities that have dedicated statistical analysts.
GENERAL EPIDEMIOLOGY SOFTWARE
Some programs occupy a middle ground between the general statistical software packages and more fully developed surveillance programs. They provide more support for data entry and management than general-purpose statistical programs, while providing more flexible statistical analysis than most specialty infection prevention/healthcare-epidemiology software packages. The primary downside is the considerable work that would be necessary to set up specific functionality that comes “ready-made” with healthcare epidemiology specific software. The benefit is the ability to design in the specific functionality the user desires.
Epi Info
Epi Info (www.cdc.gov/epiinfo/) is a program designed and distributed free by the CDC. Early versions of the program were designed to run on Microsoft MS-DOS to assist CDC Epidemiologic Intelligence Service (EIS) Officers in investigating outbreaks. It was steadily upgraded, and in the late 1990s, a version for Microsoft Windows was finally released. Initially named “Epi Info 2000,” to distinguish it from the MS-DOS version, the name has subsequently been simplified to “Epi Info” again. Development has made this an increasingly sophisticated product.
Epi Info does not generate specific reports or graphs designed for healthcare epidemiologists. Rather, it is a collection of tools that can be used in a wide variety of ways, including collecting and analyzing epidemiologic data. Functions are available to design data entry screens. Once designed, data can be entered, stored, and retrieved using the program. Internal data are stored in a relational database, which allows for sophisticated data storage, if that is needed. Double data entry (see later “Data Entry” section) for ensuring data integrity is supported.
Epi Info provides an extensive range of statistical analysis tools. Analysis is not limited to data entered and stored using the software. The program can import and analyze data that have been stored in a number of database and spreadsheet formats. Results can also be displayed in a variety of graphical formats. Advanced statistical analysis, including logistic regression and Kaplan-Meier survival analysis, is available.
In addition to statistical analysis, Epi Info contains a module for mapping of data using geographic information system (GIS) standards. Overall, Epi Info provides a framework for developing quite sophisticated systems for healthcare epidemiology-data gathering and analysis.
EpiData
EpiData (www.epidata.dk) started as a Windows-friendly data entry program for the older MS-DOS versions of Epi Info. This program has now become EpiData Entry. It enables the user to design data entry screens and then enter and manage data. EpiData Entry supports double data entry. Once entered, the data can be exported in a variety of formats for additional analysis. This includes export into SAS, SPSS, and Stata formats. As such, EpiData Entry is a good companion program to a general statistical package to provide the data entry and management functions the statistical programs lack.
EpiData Analysis is a newer program released in the fall of 2005. It provides additional data management functions beyond those offered by EpiData Entry. It also provides some basic descriptive statistical analysis and graphical functions, including statistical process control (SPC) charts. Although ongoing active development was not apparent as of mid-2012, it remains a solid option for some users.
Other Software and Resources
Sometimes an epidemiologist just needs to do a quick calculation, such as a 2 × 2 table or a sample size calculation. Such a simple task can actually be tedious to do in a full-fledged statistical package. For this purpose, Epi Info has a StatCalc module that is designed for such quick calculations. A similar program is EpiCalc 2000, written by Mark Myatt (www.brixtonhealth.com; this Web site also contains links to a wide variety of other programs that may be of use to healthcare epidemiologists). For those with ready access to the Internet, the OpenEpi project (www.openepi.com) provides a wealth of epidemiology analysis tools available in any Web browser.
Many more software packages that are of use to healthcare epidemiologists exist than can be described here. Many of the Web pages mentioned provide links to additional resources. A source of many links for statistical analysis is statpages.org. A quick search using an Internet search engine can yield a wide variety of additional options.
Commercial Infection Prevention Software
A number of software packages have been designed specifically for healthcare epidemiology programs. Such software is available from a number of vendors. These software packages allow the user to enter surveillance data (both denominator and numerator data) or gather data directly from electronic sources. They will then generate reports and graphs on rates; some of these programs include the ability to compare a facility’s rates with benchmarks from the CDC’s National Healthcare Safety Network (NHSN), and increasingly they have the ability to submit data to NHSN as well.
The advantage of these specialized infection control (IC)/healthcare epidemiology software packages is that they are designed with the needs of an infection preventionist (IP)/healthcare epidemiologist in mind. Although they are somewhat configurable, they require minimal setup to generate useful information. This narrow specialization is also their downside; if a user needs a specific functionality that is not included in the
package, the user must use other software. Vendors are eager to hear what features their users would like to see, but a feature is typically added only if there is substantial interest. A wide variety of options are available, and the features offered by each program continually increase. Each program may have features not offered by any of the others, and are increasingly being integrated into suites of other programs, such as patient safety surveillance software. Facilities interested in such software should determine what features they require, request information from each vendor, and do a careful cost-benefit analysis. Although currently these programs are independent of electronic medical records (EMRs), a logical future step would be the integration of at least some of this functionality into EMR systems.
package, the user must use other software. Vendors are eager to hear what features their users would like to see, but a feature is typically added only if there is substantial interest. A wide variety of options are available, and the features offered by each program continually increase. Each program may have features not offered by any of the others, and are increasingly being integrated into suites of other programs, such as patient safety surveillance software. Facilities interested in such software should determine what features they require, request information from each vendor, and do a careful cost-benefit analysis. Although currently these programs are independent of electronic medical records (EMRs), a logical future step would be the integration of at least some of this functionality into EMR systems.