DDL vs DML statements in SQL: A storytelling approach for Data Engineers and Data Scientists
Imagine you’re a data engineer or data scientist, and you’re tasked with building a new data warehouse. You need to start by creating the database structure, and for that, you’ll need to use DDL statements. DDL stands for Data Definition Language, and it’s used to create, alter, and drop database objects, such as tables, views, and indexes.
Once you’ve created the database structure, you can start filling it with data using DML statements. DML stands for Data Manipulation Language, and it’s used to insert, update, delete, and select data from database tables.
Here’s a simple analogy to help you understand the difference between DDL and DML statements:
Imagine you’re building a house.
DDL statements would be like the blueprints for your house. They define the overall structure of the house, such as the number of rooms, the size of the rooms, and the location of the doors and windows.
DML statements would be like the actual construction of the house. You use DML statements to add new rooms, change the size of rooms, and move doors and windows.
In the context of data engineering and data science, DDL statements are used to create and manage the database structure that will store the data.
DML statements are used to manipulate the data in the database to perform tasks such as data analysis, machine learning, and data visualization.
Here are some specific examples of how DDL and DML statements are used in data engineering and data science:
- DDL: A data engineer might use DDL statements to create a database table to store customer data. The DDL statements would define the structure of the table, such as the column names and data types.
- DML: A data scientist might use DML statements to query the customer data table to identify the top ten customers by revenue. The DML statements would be used to select the relevant data from the table and to sort it by revenue.
DDL statements:
- CREATE TABLE: Creates a new table in the database.
- ALTER TABLE: Modifies the structure of an existing table.
- DROP TABLE: Deletes a table from the database.
- CREATE VIEW: Creates a virtual table that is based on one or more existing tables.
- CREATE INDEX: Creates an index on a column in a table to improve the performance of queries.
DML statements:
- INSERT: Inserts a new row into a table.
- UPDATE: Updates the data in an existing row in a table.
- DELETE: Deletes a row from a table.
- SELECT: Queries the data in a table.
DDL and DML statements are essential tools for data engineers and data scientists. By understanding the difference between the two types of statements, you can use them effectively to build and manage databases, and to manipulate data for analysis and reporting.
Fun fact: DDL and DML statements are also used by other database management systems, such as MySQL, Oracle, and PostgreSQL. So, the skills you learn with SQL will be transferable to other database systems.