Joe Celko's Data and Databases

Concepts in Practice

1st Edition - July 27, 1999
Latest edition
Author: Joe Celko
Language: English

Do you need an introductory book on data and databases? If the book is by Joe Celko, the answer is yes. Data and Databases: Concepts in Practice is the first introduct… Read more

Purchase options

World Book Day celebration

Where learning shapes lives

Up to 25% off trusted resources that support research, study, and discovery.

Explore resources

Description

Do you need an introductory book on data and databases? If the book is by Joe Celko, the answer is yes. Data and Databases: Concepts in Practice is the first introduction to relational database technology written especially for practicing IT professionals. If you work mostly outside the database world, this book will ground you in the concepts and overall framework you must master if your data-intensive projects are to be successful. If you're already an experienced database programmer, administrator, analyst, or user, it will let you take a step back from your work and examine the founding principles on which you rely every day-helping you to work smarter, faster, and problem-free.

Whatever your field or level of expertise, Data and Databases offers you the depth and breadth of vision for which Celko is famous. No one knows the topic as well as he, and no one conveys this knowledge as clearly, as effectively-or as engagingly. Filled with absorbing war stories and no-holds-barred commentary, this is a book you'll pick up again and again, both for the information it holds and for the distinctive style that marks it as genuine Celko.

Key features

* Supports its extensive conceptual information with example code and other practical illustrations.
* Explains fundamental issues such as the nature of data and data modeling, and moves to more specific technical questions such as scales, measurements, and encoding.
* Offers fresh, engaging approaches to basic and not-so-basic issues of database programming, including data entities, relationships and values, data structures, set operations, numeric data, character string data, logical data and operations, and missing data among others.
* Covers the conceptual foundations of modern RDBMS technology, making it an ideal choice for students.

Readership

Programmers and other developers working on database applications development.

01.0 Preface

01.1 Organization of the Book

01.2 Corrections and Future Editions

02.0 The Nature of Data

02.1 Data versus Information

02.2 Information versus Wisdom

02.2.1 Innumeracy

02.2.2 Bad Math

02.2.3 Causation and Correlation

02.2.4 Testing the Model against Reality

02.3 Models versus Reality

02.3.1 Errors in Models

02.3.2 Assumptions about Reality

03.0 Entities, Attributes, Values and Relationships

03.1 Entities

03.1.1 Entities as Objects

03.1.2 Entities as Relationships

03.2 Attributes

03.3 Values

03.4 Relationships

03.5 ER Modeling

03.5.1 Mixed Models

03.6 FORM Modeling

04.0 Data Structures

04.1 Sets

04.2 Multi-sets

04.3 Simple Sequential Files

04.4 Lists

04.5 Arrays

04.6 Graphs

04.7 Trees

05.0 Relational Tables

05.1 Subsets

05.2 Union

05.3 Intersection

05.4 Set difference

05.5 Partitioning

05.5.1 Groups

05.5.2 Relational Division

05.6 Duplicates

05.6.1 Allow Duplicates

05.6.2 Disallow Duplicates

05.6.3 Consolidate Duplicates

05.6.4 Uniqueness

05.6.5 Levels of Aggregation

05.7 VIEWs

05.7.1 Updatable VIEWs

06.0 Access Structures

06.1 Indexes

06.1.1 Simple Indexes

06.2 Tree Structured Indexes

06.3 Covering Indexes

06.4 Hashing Functions

06.4.1 Uniform Hashing Function

06.4.2 Perfect Hashing Function

06.4.3 Minimal Hashing Functions

06.4.4 Multi-column Hashing Functions

06.5 Inverted Files

06.6 Bit Vector Indexes

06.7 Mixed Access Methods

06.8 Multiple Table Access Structures

06.9 An Informal Survey of Database Products

07.0 Numeric Data

07.1 Tag Numbers or Absolute Scales

07.2 Cardinal Numbers

07.3 Ordinal Numbers

07.3.1 Arithmetic with Ordinals, Cardinals and Tags

07.4.0 Computer Representations

07.4.1 Exact Numeric Representations

07.4.2 Approximate Numeric Representations versus the Continuum

07.5 Zero, NULL and Math

07.5.1 Zero, NULL and Division

07.5.2 Zero and Powers

07.6 Rounding and Truncating

07.6.1 Applied To Individual Values

07.6.2 Applied To Sets of Values

07.7 Addition and Summation are Different

07.8 Exotic Numbers

07.8.1 Fractions

07.8.2 Repeating Decimals

07.8.3 Infinite Precision Numbers

07.8.4 Complex Numbers

07.8.5 Coordinates

08.0 Character String Data

08.1 National Character Sets

08.2 Problems of String Equality

08.2.2 Empty String versus NULL String

08.3 Length

08.4 Concatenation

08.5 Position

08.6 Reversal

08.7 Pattern matching

08.8 Language as Data

08.8.1 Syntax versus Semantics

08.8.2 Computerized Language Translation

09.0 Logic and Databases

09.1 Boolean or Two Valued Logic

09.2 Multi-valued Logic

09.2.2 SQL-92 and Three Valued Logic

09.3 Fuzzy Sets, Logic and Math

09.3.1 Fuzzy Logic operators

09.3.2 Fuzzy Math Operators

09.4 Constructionist Logic

10.0 Temporal Data

10.1 Temporal Terminology

10.1.1 Events

10.1.2 Durations

10.1.3 Periods

10.1.4 Time in Databases

10.2 A Short History of the Calendar

10.3 The Julian Date

10.4 ISO Temporal Standards

10.4.1 ISO Date Formats

10.4.1.1 Calendar Date Format

10.4.1.2 Ordinal Date Format

10.4.1.3 Week Format

10.5 The Year 2000 Problem

10.5.1 The Odometer Problem

10.5.2 The Leap Year Problem

10.5.3 The Millennium Problem

10.5.4 Weird Dates

10.5.5 Solutions

10.6 ISO Time of Day

10.6.1 Local Time, lawful Time and UTC

10.7 Notes on SQL-92 and Temporal Data

10.7.1 Time Zones in SQL-92

10.7.2 the OVERLAPS Predicate

11.0 Textual Data

11.1 Terminology and the Basics

11.2. Indexing Text

11.3 Text Searching

11.4 The grep() Search Tool

11.5 ANSI and ISO Search Languages

11.5.1 ANSI Common Command Language

11.5.2 The ISO 8777 Language

12.0 Exotic Data

12.1 Nature of Data Manipulation

12.2 Physical versus Perceptual Data

12.2.1 Physical Data

12.2.2 Perceptual Data

12.3 Stored versus Constructed Data

12.4 SQL/MM

13.0 Missing data

13.1 Types of missing Data

13.1.1 Unknown

13.1.2 Not Applicable

13.1.3 Missing

13.1.4 Not Classified

13.1.5 Erroneous Data

13.1.5.1 Illegal Results

13.1.5.2 Error in Representation

13.1.5.3 Limited Values

13.2 The NULL in SQL

14.0 Scales and Measurements

14.1 Range and Granularity

14.1 Scales and Measurements

14.1.1. Range

14.1.2. Granularity and Precision

14.2 Types of Scales

14.3 Scale Conversion

14.4 Derived Units

14.5 Punctuation and Standard Units

14.6 General Guidelines for Using Scales in a Database

15.0 Data Encoding Schemes

15.1 Bad Encoding Schemes

15.1.2 Characteristics of a Bad Encoding Scheme

15.2 Encoding Scheme Types

15.2.1 Enumeration Encoding

15.2.2 Scale Encoding

15.2.3 Abbreviation Encoding

15.2.4 Algorithmic Encoding

15.2.5 Hierarchical Encoding

15.2.6 Vector Encoding

15.2.6.1 Social Security Numbers

15.2.7 Concatenation Encoding

15.3 General Guidelines for Designing Encoding Schemes

15.3.1 Existing Encoding Standards

15.3.2 Allow for Expansion

15.3.3 Use Explicit Missing Values To Avoid Nulls

15.3.4 Translate Codes for the User

15.3.5 Keep the Codes in the Database

16. Check Digits

16.1 Error Detection versus Error Correction

16.2 Mod Functions and Casting Functions

16.2.1 Mod Function

16.2.2 Casting Functions

16.3 Classes of Algorithms

16.3.1 Weighted-Sum Algorithms

16.3.2 Power-Sum Check Digits

16.3.3 Bull Check Digits

16.3.4 Dihedral Five Check Digit

16.4 Check Digit Algorithms

16.4.1 Casting Nines

16.4.2 Dihedral Five Check Digits

16.4.3 Bull Function

16.4.4 Power Function

16.4.5 SKU Code Algorithm

16.4.5 ISBN Function

16.4.6 Code 39 Algorithm

17.0 The Basic Relational Model

17.1 Tables versus Files

17.1.1 Tables as Entities

17.1.2 Tables as Relationships

17.2 Rows versus Records

17.3 Columns versus Fields

17.4 Relationships among Tables within a database

17.5 Codd's Rules for a Relational Database

18.0 Keys

18.1 Types of keys

18.2 Intelligent Keys versus Surrogate Keys.

18.2.1 Arguments against Intelligent Keys

18.2.1.1 Autoincremented Surrogate Keys

18.2.1.2 (MAX() + 1) Surrogate key generators

18.2.1.2 Random Number Surrogate keys

18.2.1.3 Surrogate Key Tables

18.2.2 Arguments for Intelligent Keys

18.2.2.1 Saving Space in the Database

18.2.2.2 Verification of the Data

18.2.2.3 Changing Surrogate Keys is Hard

18.2.2.4 Getting New Keys

18.2.2.5 Clustering Versus Random Distribution

18.2.2.6 Standards Versus "Roll Your Own"

18.2.2.7 Data Warehousing

18.2.2.8 Replicated and Distributed Databases

18.3 Simple versus Compound Keys

18.3.1 Super keys and Overlapping Keys

18.3.2 Queries with Compound keys

19.0 Different Relational Models

19.1 Chris Date = no duplicates, no NULLs

19.2 E.F Codd, RM version I

19.3 E.F Codd, RM version II

19.4 SQL-92 = duplicates, one NULL

19.5 Rick Snodgrass = Temporal SQL

19.5 Roth & Korth = duplicates, one NULL, non-1NF Tables

20.0 Basic Relational Operations

20.1 Projection

20.2 Restriction

20.3 Computations

20.3.1 Expressions Using Other Tables

20.3.1 Simple Operators and Expressions

20.4 Joins

20.4.1 Inner Join

20.4.2 Outer Join

20.4.2.1 Left Outer Join

20.4.2.2 Right Outer Join

20.4.2.3 Full Outer Join

20.4.3 Self Join

20.4.4 Union Join

21.0 Transactions and Concurrency Control

21.1 Sessions

21.2 Transactions and ACID

21.3 Concurrency

21.3.1 The Three Phenomena

21.3.2 The Four Isolation Levels

21.4 Pessimistic Concurrency Control

21.5 Optimistic Concurrency Control

21.6 Logical Concurrency Control

21.7 Deadlock and Livelocks

22.0 Functional Dependencies

22.1 Armstrong's Axioms

22.2 Multi-valued Dependencies

22.3 Mappings

23.0 Normalization

23.1 Anomalies

23.2 First Normal Form (1NF)

23.2.1 Note on Repeated Groups

23.3 Second Normal Form (2NF)

23.4 Third Normal Form (3NF)

23.5 Boyce-Codd Normal Form (BCNF)

23.6 Fourth Normal Form (4NF)

23.7 Fifth Normal Form (5NF)

23.8 Domain-Key Normal Form (DKNF)

23.9 Functional Dependency Axioms

23.10 Practical Hints for Normalization

23.10.1 Case Tools for Normalization

24.0 Denormalization

24.1. Criteria for Denormalization

24.2 Types of Denormalization

24.2.1 Pre-joined Tables

24.2.2 Reports

24.2.3 Mirrored Tables

24.2.4 Table Splitting

24.2.5 Combined Tables

24.2.6 Redundant Data

24.2.7 Repeating Groups

24.2.8 Derivable Data

24.2.9 Hierarchy Tables

24.2.10. Overloaded Datatypes

25.0 Metadata

25.1 Data

25.1.1 Data Assets

25.1.2 Data Engineering Assets

25.1.3 Core Data

25.1.3 Data Management Assets

25.2 Metadata Management

25.2.1 Database Management Systems

25.2.2 Data Dictionary

25.2.3 Data Directory

25.2.4 Data Encyclopedia

25.2.5 Data Element Registry

25.2.6 Repository

25.3 Data Dictionary Environments

25.3.1 Application Data Dictionary

25.3.2 Functional Area Data Dictionaries

25.3.3 The ITS Data Dictionary/Registry

25.4 NCITS L8 Standards

25.4.1 Naming Data Elements

25.4.1.1 Levels of Abstraction

25.4.2 Registering Standards

Product details

Edition: 1
Latest edition
Published: July 27, 1999
Language: English

About the author

Joe Celko

Joe Celko served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.

Mr. Celko is author a series of books on SQL and RDBMS for Elsevier/MKP. He is an independent consultant based in Austin, Texas.

He has written over 1200 columns in the computer trade and academic press, mostly dealing with data and databases.

Affiliations and expertise

Independent Consultant, Austin, Texas

Life Sciences

Physical Sciences & Engineering

Social Sciences & Humanities

Health