Classification is a constitutive part in many different fields of Computer Science. There exist several approaches that capture and manipulate classification information in order to construct a specific classification model. These approaches are often tightly coupled to certain learning strategies, special data structures for capturing the models, and to how common problems, e.g. fragmentation, replication and model overfitting, are addressed. In order to unify these different classification approaches, we define a Decision Algebra which defines models for classification as higher order decision functions abstracting from their implementations using decision trees (or similar), decision rules, decision tables, etc. Decision Algebra defines operations for learning, applying, storing, merging, approximating, and manipulating models for classification, along with some general algebraic laws regardless of the implementation used. The Decision Algebra abstraction has several advantages. First, several useful Decision Algebra operations (e.g., learning and deciding) can be derived based on the implementation of a few core operations (including merging and approximating). Second, applications using classification can be defined regardless of the different approaches. Third, certain properties of Decision Algebra operations can be proved regardless of the actual implementation. For instance, we show that the merger of a series of probably accurate decision functions is even more accurate, which can be exploited for efficient and general online learning. As a proof of the Decision Algebra concept, we compare decision trees with decision graphs, an efficient implementation of the Decision Algebra core operations, which capture classification models in a non-redundant way. Compared to classical decision tree implementations, decision graphs are 20% faster in learning and classification without accuracy loss and reduce memory consumption by 44%. This is the result of experiments on a number of standard benchmark data sets comparing accuracy, access time, and size of decision graphs and trees as constructed by the standard C4.5 algorithm. Finally, in order to test our hypothesis about increased accuracy when merging decision functions, we merged a series of decision graphs constructed over the data sets. The result shows that on each step the accuracy of the merged decision graph increases with the final accuracy growth of up to 16%.