Skip to main content

Joe Celko's Data and Databases

Concepts in Practice

  • 1st Edition - July 27, 1999
  • Latest edition
  • Author: Joe Celko
  • Language: English

Do you need an introductory book on data and databases? If the book is by Joe Celko, the answer is yes. Data and Databases: Concepts in Practice is the first introduct… Read more

Purchase options

Sorry, this title is not available for purchase in your country/region.

World Book Day celebration

Where learning shapes lives

Up to 25% off trusted resources that support research, study, and discovery.

Description

Do you need an introductory book on data and databases? If the book is by Joe Celko, the answer is yes. Data and Databases: Concepts in Practice is the first introduction to relational database technology written especially for practicing IT professionals. If you work mostly outside the database world, this book will ground you in the concepts and overall framework you must master if your data-intensive projects are to be successful. If you're already an experienced database programmer, administrator, analyst, or user, it will let you take a step back from your work and examine the founding principles on which you rely every day-helping you to work smarter, faster, and problem-free.

Whatever your field or level of expertise, Data and Databases offers you the depth and breadth of vision for which Celko is famous. No one knows the topic as well as he, and no one conveys this knowledge as clearly, as effectively-or as engagingly. Filled with absorbing war stories and no-holds-barred commentary, this is a book you'll pick up again and again, both for the information it holds and for the distinctive style that marks it as genuine Celko.

Key features

* Supports its extensive conceptual information with example code and other practical illustrations.
* Explains fundamental issues such as the nature of data and data modeling, and moves to more specific technical questions such as scales, measurements, and encoding.
* Offers fresh, engaging approaches to basic and not-so-basic issues of database programming, including data entities, relationships and values, data structures, set operations, numeric data, character string data, logical data and operations, and missing data among others.
* Covers the conceptual foundations of modern RDBMS technology, making it an ideal choice for students.

Readership

Programmers and other developers working on database applications development.

Table of contents

01.0 Preface


01.1 Organization of the Book


01.2 Corrections and Future Editions





02.0 The Nature of Data


02.1 Data versus Information


02.2 Information versus Wisdom




02.2.1 Innumeracy


02.2.2 Bad Math


02.2.3 Causation and Correlation


02.2.4 Testing the Model against Reality


02.3 Models versus Reality


02.3.1 Errors in Models


02.3.2 Assumptions about Reality




03.0 Entities, Attributes, Values and Relationships


03.1 Entities




03.1.1 Entities as Objects


03.1.2 Entities as Relationships


03.2 Attributes


03.3 Values


03.4 Relationships


03.5 ER Modeling



03.5.1 Mixed Models


03.6 FORM Modeling




04.0 Data Structures


04.1 Sets


04.2 Multi-sets


04.3 Simple Sequential Files


04.4 Lists


04.5 Arrays


04.6 Graphs


04.7 Trees





05.0 Relational Tables


05.1 Subsets


05.2 Union


05.3 Intersection


05.4 Set difference


05.5 Partitioning




05.5.1 Groups


05.5.2 Relational Division


05.6 Duplicates



05.6.1 Allow Duplicates


05.6.2 Disallow Duplicates


05.6.3 Consolidate Duplicates


05.6.4 Uniqueness


05.6.5 Levels of Aggregation


05.7 VIEWs



05.7.1 Updatable VIEWs




06.0 Access Structures


06.1 Indexes




06.1.1 Simple Indexes


06.2 Tree Structured Indexes


06.3 Covering Indexes


06.4 Hashing Functions



06.4.1 Uniform Hashing Function


06.4.2 Perfect Hashing Function


06.4.3 Minimal Hashing Functions


06.4.4 Multi-column Hashing Functions


06.5 Inverted Files


06.6 Bit Vector Indexes


06.7 Mixed Access Methods


06.8 Multiple Table Access Structures


06.9 An Informal Survey of Database Products




07.0 Numeric Data


07.1 Tag Numbers or Absolute Scales


07.2 Cardinal Numbers


07.3 Ordinal Numbers




07.3.1 Arithmetic with Ordinals, Cardinals and Tags


07.4.0 Computer Representations



07.4.1 Exact Numeric Representations


07.4.2 Approximate Numeric Representations versus the Continuum


07.5 Zero, NULL and Math



07.5.1 Zero, NULL and Division


07.5.2 Zero and Powers


07.6 Rounding and Truncating



07.6.1 Applied To Individual Values


07.6.2 Applied To Sets of Values


07.7 Addition and Summation are Different


07.8 Exotic Numbers



07.8.1 Fractions


07.8.2 Repeating Decimals


07.8.3 Infinite Precision Numbers


07.8.4 Complex Numbers


07.8.5 Coordinates




08.0 Character String Data


08.1 National Character Sets


08.2 Problems of String Equality




08.2.2 Empty String versus NULL String


08.3 Length


08.4 Concatenation


08.5 Position


08.6 Reversal


08.7 Pattern matching


08.8 Language as Data



08.8.1 Syntax versus Semantics


08.8.2 Computerized Language Translation




09.0 Logic and Databases


09.1 Boolean or Two Valued Logic


09.2 Multi-valued Logic




09.2.2 SQL-92 and Three Valued Logic


09.3 Fuzzy Sets, Logic and Math



09.3.1 Fuzzy Logic operators


09.3.2 Fuzzy Math Operators


09.4 Constructionist Logic




10.0 Temporal Data


10.1 Temporal Terminology




10.1.1 Events


10.1.2 Durations


10.1.3 Periods


10.1.4 Time in Databases


10.2 A Short History of the Calendar


10.3 The Julian Date


10.4 ISO Temporal Standards



10.4.1 ISO Date Formats


10.4.1.1 Calendar Date Format


10.4.1.2 Ordinal Date Format


10.4.1.3 Week Format


10.5 The Year 2000 Problem



10.5.1 The Odometer Problem


10.5.2 The Leap Year Problem


10.5.3 The Millennium Problem


10.5.4 Weird Dates


10.5.5 Solutions


10.6 ISO Time of Day



10.6.1 Local Time, lawful Time and UTC


10.7 Notes on SQL-92 and Temporal Data



10.7.1 Time Zones in SQL-92


10.7.2 the OVERLAPS Predicate




11.0 Textual Data


11.1 Terminology and the Basics


11.2. Indexing Text


11.3 Text Searching


11.4 The grep() Search Tool


11.5 ANSI and ISO Search Languages




11.5.1 ANSI Common Command Language


11.5.2 The ISO 8777 Language




12.0 Exotic Data


12.1 Nature of Data Manipulation


12.2 Physical versus Perceptual Data




12.2.1 Physical Data


12.2.2 Perceptual Data


12.3 Stored versus Constructed Data


12.4 SQL/MM




13.0 Missing data


13.1 Types of missing Data




13.1.1 Unknown


13.1.2 Not Applicable


13.1.3 Missing


13.1.4 Not Classified


13.1.5 Erroneous Data



13.1.5.1 Illegal Results


13.1.5.2 Error in Representation


13.1.5.3 Limited Values


13.2 The NULL in SQL




14.0 Scales and Measurements


14.1 Range and Granularity


14.1 Scales and Measurements




14.1.1. Range


14.1.2. Granularity and Precision


14.2 Types of Scales


14.3 Scale Conversion


14.4 Derived Units


14.5 Punctuation and Standard Units


14.6 General Guidelines for Using Scales in a Database




15.0 Data Encoding Schemes


15.1 Bad Encoding Schemes




15.1.2 Characteristics of a Bad Encoding Scheme


15.2 Encoding Scheme Types



15.2.1 Enumeration Encoding


15.2.2 Scale Encoding


15.2.3 Abbreviation Encoding


15.2.4 Algorithmic Encoding


15.2.5 Hierarchical Encoding


15.2.6 Vector Encoding



15.2.6.1 Social Security Numbers


15.2.7 Concatenation Encoding


15.3 General Guidelines for Designing Encoding Schemes



15.3.1 Existing Encoding Standards


15.3.2 Allow for Expansion


15.3.3 Use Explicit Missing Values To Avoid Nulls


15.3.4 Translate Codes for the User


15.3.5 Keep the Codes in the Database




16. Check Digits


16.1 Error Detection versus Error Correction


16.2 Mod Functions and Casting Functions




16.2.1 Mod Function


16.2.2 Casting Functions


16.3 Classes of Algorithms



16.3.1 Weighted-Sum Algorithms


16.3.2 Power-Sum Check Digits


16.3.3 Bull Check Digits


16.3.4 Dihedral Five Check Digit


16.4 Check Digit Algorithms



16.4.1 Casting Nines


16.4.2 Dihedral Five Check Digits


16.4.3 Bull Function


16.4.4 Power Function


16.4.5 SKU Code Algorithm


16.4.5 ISBN Function


16.4.6 Code 39 Algorithm




17.0 The Basic Relational Model


17.1 Tables versus Files




17.1.1 Tables as Entities


17.1.2 Tables as Relationships


17.2 Rows versus Records


17.3 Columns versus Fields


17.4 Relationships among Tables within a database


17.5 Codd's Rules for a Relational Database




18.0 Keys


18.1 Types of keys


18.2 Intelligent Keys versus Surrogate Keys.




18.2.1 Arguments against Intelligent Keys



18.2.1.1 Autoincremented Surrogate Keys


18.2.1.2 (MAX() + 1) Surrogate key generators


18.2.1.2 Random Number Surrogate keys


18.2.1.3 Surrogate Key Tables


18.2.2 Arguments for Intelligent Keys



18.2.2.1 Saving Space in the Database


18.2.2.2 Verification of the Data


18.2.2.3 Changing Surrogate Keys is Hard


18.2.2.4 Getting New Keys


18.2.2.5 Clustering Versus Random Distribution


18.2.2.6 Standards Versus "Roll Your Own"


18.2.2.7 Data Warehousing


18.2.2.8 Replicated and Distributed Databases


18.3 Simple versus Compound Keys



18.3.1 Super keys and Overlapping Keys


18.3.2 Queries with Compound keys




19.0 Different Relational Models


19.1 Chris Date = no duplicates, no NULLs


19.2 E.F Codd, RM version I


19.3 E.F Codd, RM version II


19.4 SQL-92 = duplicates, one NULL


19.5 Rick Snodgrass = Temporal SQL


19.5 Roth & Korth = duplicates, one NULL, non-1NF Tables





20.0 Basic Relational Operations


20.1 Projection


20.2 Restriction


20.3 Computations




20.3.1 Expressions Using Other Tables


20.3.1 Simple Operators and Expressions


20.4 Joins


20.4.1 Inner Join


20.4.2 Outer Join



20.4.2.1 Left Outer Join


20.4.2.2 Right Outer Join


20.4.2.3 Full Outer Join


20.4.3 Self Join


20.4.4 Union Join




21.0 Transactions and Concurrency Control


21.1 Sessions


21.2 Transactions and ACID


21.3 Concurrency




21.3.1 The Three Phenomena


21.3.2 The Four Isolation Levels


21.4 Pessimistic Concurrency Control


21.5 Optimistic Concurrency Control


21.6 Logical Concurrency Control


21.7 Deadlock and Livelocks




22.0 Functional Dependencies


22.1 Armstrong's Axioms


22.2 Multi-valued Dependencies


22.3 Mappings





23.0 Normalization


23.1 Anomalies


23.2 First Normal Form (1NF)




23.2.1 Note on Repeated Groups


23.3 Second Normal Form (2NF)


23.4 Third Normal Form (3NF)


23.5 Boyce-Codd Normal Form (BCNF)


23.6 Fourth Normal Form (4NF)


23.7 Fifth Normal Form (5NF)


23.8 Domain-Key Normal Form (DKNF)


23.9 Functional Dependency Axioms


23.10 Practical Hints for Normalization


23.10.1 Case Tools for Normalization




24.0 Denormalization


24.1. Criteria for Denormalization


24.2 Types of Denormalization




24.2.1 Pre-joined Tables


24.2.2 Reports


24.2.3 Mirrored Tables


24.2.4 Table Splitting


24.2.5 Combined Tables


24.2.6 Redundant Data


24.2.7 Repeating Groups


24.2.8 Derivable Data


24.2.9 Hierarchy Tables


24.2.10. Overloaded Datatypes




25.0 Metadata


25.1 Data




25.1.1 Data Assets


25.1.2 Data Engineering Assets


25.1.3 Core Data


25.1.3 Data Management Assets


25.2 Metadata Management



25.2.1 Database Management Systems


25.2.2 Data Dictionary


25.2.3 Data Directory


25.2.4 Data Encyclopedia


25.2.5 Data Element Registry


25.2.6 Repository


25.3 Data Dictionary Environments



25.3.1 Application Data Dictionary


25.3.2 Functional Area Data Dictionaries


25.3.3 The ITS Data Dictionary/Registry


25.4 NCITS L8 Standards



25.4.1 Naming Data Elements



25.4.1.1 Levels of Abstraction


25.4.2 Registering Standards

Product details

  • Edition: 1
  • Latest edition
  • Published: July 27, 1999
  • Language: English

About the author

JC

Joe Celko

Joe Celko served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.

Mr. Celko is author a series of books on SQL and RDBMS for Elsevier/MKP. He is an independent consultant based in Austin, Texas.

He has written over 1200 columns in the computer trade and academic press, mostly dealing with data and databases.

Affiliations and expertise
Independent Consultant, Austin, Texas