Thursday, November 15, 2018

Does the Selection of Data Structures and Algorithms Really Matter?

YES.  Sorry, I didn’t mean to scare you there, but yes, data structure and algorithm selection matters in the design and development of efficient programs.  Shaffer (2013) addresses the skepticism facing this matter considering the fact that computing power has been increasing at a rapid rate.  So wouldn’t the increase in computing power negate the need to optimize software through the use of appropriate data structures and algorithms?  No.  Increased computing power combined with optimized code allows for solving problems of even greater complexity than we might have previously imagined.

SO, what are data structures and algorithms?  A data structure is a way of organizing data, and includes associated operations (Shaffer, 2013).  Examples of data structures include arrays, lists, stacks, queues, trees, and hash tables, as well as others.  Algorithms are a series of steps that combined, carry out a specific task, such as sorting a list (Lysecky, 2015).

BEFORE selecting a data structure and/or algorithm, it is important for programmers to analyze the goals and requirements of the project.  According to Shaffer (2013), there are three main steps to achieve this:
     1) Analyze your problem to determine the basic operations that
          must be supported.  Examples of basic operations include
          inserting a data item into the data structure, deleting a data
          item from the data structure, and finding a specified data
          item.
     2) Quantify the resource constraints for each operation.
     3) Select the data structure that best meets these
          requirements.
Chances are that one data structure or algorithm is not going to be best in all circumstances.  Otherwise, data structures and algorithms that remain in use today would have already become obsolete.  As Shafer (2013) states, “Each data structure and each algorithm has costs and benefits.”

WHEN considering appropriate data structures and algorithms for a project, programmers require a way to predict run-time, as well as to compare one data structure or algorithm to another.  This is often achieved by evaluating time and space complexities.  Time complexity is the amount of time required for an algorithm to run, while space complexity is the amount of memory required for execution.  Both of these metrics are commonly expressed through the use of Big O notation, which is a function that describes how time and space complexity are affected by the number of elements (University of Texas, n.d.).

ONE example of how input size affects time complexity is described in Complexity Analysis (n.d.).  In the scenario presented, the time complexity of a data set using a merge sort algorithm is compared to that of the same data set using a selection sort algorithm.  When implementing the two algorithms on a small data set, the selection sort algorithm is more efficient.  However, as n (the number of elements) becomes very large, merge sort is a much better alternative.  In fact, with a data set containing 10,000,000 elements, the selection sort requires 100 trillion accesses, or about 11.5 days to complete.  On the other hand, the same data set run with a merge sort algorithm requires 1.2 billion accesses, for a total sorting time of 12 seconds!  Clearly, algorithm selection in this scenario has large implications.

SIMILARLY, the benefit of selecting the correct data structure is illustrated in an example provided by Shaffer (2013).  Shaffer describes a scenario in which a programmer needs to select a data structure that organizes account and transactional information for a bank.  Since opening new accounts does not necessitate a very expedient process (clients are willing to invest time to open a new account), but withdrawing or depositing money from an ATM does, these are considerations that need to be made when selecting a data structure.  In addition, closing accounts do not necessitate expediency, since most clients are not concerned if it takes 24 hours, as long as they can withdraw the money first.  In this scenario, Shaffer recommends a hash table, since the ability to search is of high priority (to make deposits and withdrawals), insertion (opening a new account) is medium priority, and deletion (closing an account) has low priority.

THEREFORE, data structure and algorithm selection does matter, because the strategic design of applications can reduce time and space complexity, and may even improve the user experience.  As Shaffer (2013) states, “using the proper data structure can make the difference between a program running in a few seconds and one requiring many days.”

Resources

Lysecky, R., Vahid, F., Lysecky, S., & Givargis, T. (2015). Data structures essentials. Retrieved from https://zybooks.zyante.com/#/zybook/DataStructuresEssentialsR25/chapter/1/section/3

Shaffer, C. A. (2013). Data structures and algorithm analysis (Edition 3.2). Retrieved from http://people.cs.vt.edu/~shaffer/Book/JAVA3elatest.pdf

University of Texas at Austin Computer Science Department.  (n.d.).  Complexity analysis Retrieved from http://www.cs.utexas.edu/users/djimenez/utsa/cs1723/lecture2.html

Zeigler, J. (2004). Time, complexity, space complexity, and the O-notation. Retrieved from http://www.leda-tutorial.org/en/official/ch02s02s03.html