If you’ve been trying to connect to a database on-premise or on a local disk, and found ambiguous online resources and inconsistent terminology, then you will enjoy this article
I’ve put my best effort to provide you with a clear, concise, and detailed description on how to connect to, and manage any database from Python
1. Introduction
2. PyODBC: Open DataBase Connectivity for Python
3. SQLite3: SQLite for Python
4. Executing SQL statements
5. Reading data into Pandas DataFrame
TL;DR: full code
1. Introduction
A database model determines the logical structure of a database (a database is an organized collection of data). This in turn determines how data can be stored, organized and manipulated. The Relational Model (RM) is the most popular database model since the 1980s. RM uses a table-based format, where tables are related by common columns
Database management system (DBMS)
DBMS is the software that you -or applications- use to interact with the database to create, read, update and manage data. The Relational DBMS (RDBMS) is the DBMS based on RM. According to DB-Engines, the most widely used RDBMS are: Oracle, MySQL, Microsoft SQL Server, PostgreSQL, IBM DB2, Microsoft Access, and SQLite
A database is often referred to by the DBMS used to manipulate it. For example, a database that is manipulated using Microsoft SQL Server is referred to as: Microsoft SQL Server-Database
Although not required, all RDBMS use SQL as a standard data access language
Database connectivity interface
A database connectivity interface allows an application to access data from a variety of DBMSs, using a specific driver for a specific DBMS and operating system. This means that the application can be written without depending on a specific DBMS or the operating system
Open DataBase Connectivity (ODBC) is a standard Microsoft Windows interface that enables applications (typically written in C or C++) to connect to DBMSs. While, Java DataBase Connectivity (JDBC) is a standard Oracle interface that enables applications written in Java to connect to DBMSs
2. PyODBC: Open DataBase Connectivity for Python
PyODBC is the Python package for ODBC. Through pyodbc, you can easily connect Python applications to a DBMS-database, using the required ODBC driver
Installation
Step 1: install pyodbc through pip
pip install pyodbc
Step 2: install the required driver for the DBMS-database you want to connect to. For example, if you want to connect to a Microsoft SQL Server-Database, you need to download and install the driver from Microsoft, after choosing your operating system
Connecting to a database
To make a connection to a database, we need to pass a connection string to the connect() function of pyodbc. The most important thing to remember is that: pyodbc passes the connection string directly to the DBMS-database driver unmodified. Therefore, connection strings are driver-specific
For example, to connect to a Microsoft SQL Server-Database, we provide the following connection string:
Note: you can find information about general connection string for most databases here. Make sure to follow the connection string formatting rules found here
The connection string can also be passed as arguments, which are then concatenated into a connection string:
In the connection string above, we assume that the SQL Server Authentication mode is set to Windows Authentication Mode, so we did not have to provide credentials to get access permissions to the server.
However, if it was a Mixed Authentication Mode (i.e. SQL Server Authentication + Windows Authentication), we can still use the above connection string, or the following one where we provide our User ID and Password to get access permissions to the server:
Pyodbc also supports Data Source Name connections. A DSN contains information about a specific database that an ODBC driver needs in order to connect to it (information such as the database driver, the server and database names, etc):
Lastly, depending on the database you are trying to access, and the version of Python you are using, you might need to set the connection encoding/decoding settings, which you can find here
For Microsoft SQL Server-Database and Python 3.x, using pyodbc default encoding/decoding settings is recommended (i.e. no need to do anything)
3. SQLite3: SQLite for Python
SQLite3 is the Python package for SQLite, which is the most widely deployed RDBMS in the world (arguably). The main features of SQLite are:
An embedded database. SQLite does not have a separate server process (i.e. server-less). It reads and writes directly to ordinary disk files
Free for use for any purpose, commercial or private
Installation
SQLite3 is included with Python by default. So even if you have created a new Anaconda environment, it will be installed for you
Connecting to a database
To create a connection to a disk-based database, we just need to pass the name of the database to the connect() function of sqlite3:
If the database does not exist, an empty one will be created for us
4. Executing SQL statements
Once we have a connection to the database, either from pyodbc or sqlite3, we can then create a Cursor object which represents a database cursor. We can do this using the cursor() method:
Now we have a Cursor object, we can execute any valid SQL query string using the Cursor execute() method:
Let’s assume that the database we are connected to (database.db), has 2 tables:
T_CUSTOMERS
T_ADDRESSES
SELECT statement
To select the id, fname, and lname columns from T_CUSTOMERS, we use SQL SELECT:
The executed SELECT statement returns rows, which we can retrieve using one of the Cursor fetch functions:
fetchone(): retrieves one row only, and move the Curser to the next row
fetchall(): retrieves all rows, and move the Curser to the end
fetchmany(size): retrieves a number of rows, and move the Curser to the next row
Note: if there are no rows left, fetchone() will return None, whereas fetchall() and fetchmany() will both return empty lists
For example, to retrieve one row only:
row will be a Tuple of values, one for each of the columns in the SELECT statement:
To get each column’s description, the Cursor description attribute can be used:
This description will be a list of Tuples, one for each column. Each Tuple has 7 items to describe the column, the one that matters to us is the first item, which is the column’s name (or alias, if specified in the SQL SELECT):
To retrieve all the remaining rows as a list, we can use fetchall():
UPDATE statement
To update an existing record in T_CUSTOMERS, we use SQL UPDATE with a WHERE clause
Note: if you do not use a WHERE clause in the UPDATE statement to specify which record(s) should be updated, all records will be updated!
Let’s update lname=’Trujillo’ to ‘Taqueria’:
If we want to know how many records were modified by the last SQL statement, we can use the Cursor rowcount attribute. This will return -1 if no SQL has been executed or if the number of rows is unknown:
Let’s check if that record has been updated successfully:
DELETE statement
To delete a customer’s record from T_CUSTOMERS, we use SQL DELETE with a WHERE clause
Note: if you do not use a WHERE clause in the DELETE statement to specify which record(s) should be deleted, all records will be deleted!
Let’s delete the record with id=4:
Let’s check how many records were modified:
Let’s check if that record has been deleted successfully:
INSERT INTO statement
To insert a new record into T_CUSTOMERS, we use SQL INSERT INTO
Let’s insert the same record we have deleted:
Let’s check how many records were modified:
Let’s check if that record has been inserted successfully:
Long SQL statement
Let’s say that we want to select the full name, and city and postal code of a customer with the last name ‘Trujillo’. We (or someone else) has written the following SQL statement to do this:
The easiest way to pass this long SQL statement to excute() is to use the triple-quote string format to encapsulate the statement. White-spaces (including tabs and newlines) should be ignored by the SQL database engine, but if we want to remove them from the left, we can use the dedent() function from the built-in textwrap module:
Finally, we must call commit() on the connection that created this cursor, otherwise our changes will be lost if we close the Script:
5. Reading data into Pandas DataFrame
Pandas can read an SQL statement directly into a dataframe without using a Cursor. This can be done using the read_sql(sql_string, connection) function
I’ve put my best effort to provide you with a clear, concise, and detailed description on how to connect to, and manage any database from Python
1. Introduction
2. PyODBC: Open DataBase Connectivity for Python
3. SQLite3: SQLite for Python
4. Executing SQL statements
5. Reading data into Pandas DataFrame
TL;DR: full code
1. Introduction
A database model determines the logical structure of a database (a database is an organized collection of data). This in turn determines how data can be stored, organized and manipulated. The Relational Model (RM) is the most popular database model since the 1980s. RM uses a table-based format, where tables are related by common columns
Database management system (DBMS)
DBMS is the software that you -or applications- use to interact with the database to create, read, update and manage data. The Relational DBMS (RDBMS) is the DBMS based on RM. According to DB-Engines, the most widely used RDBMS are: Oracle, MySQL, Microsoft SQL Server, PostgreSQL, IBM DB2, Microsoft Access, and SQLite
A database is often referred to by the DBMS used to manipulate it. For example, a database that is manipulated using Microsoft SQL Server is referred to as: Microsoft SQL Server-Database
Although not required, all RDBMS use SQL as a standard data access language
Database connectivity interface
A database connectivity interface allows an application to access data from a variety of DBMSs, using a specific driver for a specific DBMS and operating system. This means that the application can be written without depending on a specific DBMS or the operating system
Open DataBase Connectivity (ODBC) is a standard Microsoft Windows interface that enables applications (typically written in C or C++) to connect to DBMSs. While, Java DataBase Connectivity (JDBC) is a standard Oracle interface that enables applications written in Java to connect to DBMSs
2. PyODBC: Open DataBase Connectivity for Python
PyODBC is the Python package for ODBC. Through pyodbc, you can easily connect Python applications to a DBMS-database, using the required ODBC driver
Installation
Step 1: install pyodbc through pip
pip install pyodbc
Step 2: install the required driver for the DBMS-database you want to connect to. For example, if you want to connect to a Microsoft SQL Server-Database, you need to download and install the driver from Microsoft, after choosing your operating system
Connecting to a database
To make a connection to a database, we need to pass a connection string to the connect() function of pyodbc. The most important thing to remember is that: pyodbc passes the connection string directly to the DBMS-database driver unmodified. Therefore, connection strings are driver-specific
For example, to connect to a Microsoft SQL Server-Database, we provide the following connection string:
Note: you can find information about general connection string for most databases here. Make sure to follow the connection string formatting rules found here
The connection string can also be passed as arguments, which are then concatenated into a connection string:
In the connection string above, we assume that the SQL Server Authentication mode is set to Windows Authentication Mode, so we did not have to provide credentials to get access permissions to the server.
However, if it was a Mixed Authentication Mode (i.e. SQL Server Authentication + Windows Authentication), we can still use the above connection string, or the following one where we provide our User ID and Password to get access permissions to the server:
Pyodbc also supports Data Source Name connections. A DSN contains information about a specific database that an ODBC driver needs in order to connect to it (information such as the database driver, the server and database names, etc):
Lastly, depending on the database you are trying to access, and the version of Python you are using, you might need to set the connection encoding/decoding settings, which you can find here
For Microsoft SQL Server-Database and Python 3.x, using pyodbc default encoding/decoding settings is recommended (i.e. no need to do anything)
3. SQLite3: SQLite for Python
SQLite3 is the Python package for SQLite, which is the most widely deployed RDBMS in the world (arguably). The main features of SQLite are:
An embedded database. SQLite does not have a separate server process (i.e. server-less). It reads and writes directly to ordinary disk files
Free for use for any purpose, commercial or private
Installation
SQLite3 is included with Python by default. So even if you have created a new Anaconda environment, it will be installed for you
Connecting to a database
To create a connection to a disk-based database, we just need to pass the name of the database to the connect() function of sqlite3:
If the database does not exist, an empty one will be created for us
4. Executing SQL statements
Once we have a connection to the database, either from pyodbc or sqlite3, we can then create a Cursor object which represents a database cursor. We can do this using the cursor() method:
Now we have a Cursor object, we can execute any valid SQL query string using the Cursor execute() method:
Let’s assume that the database we are connected to (database.db), has 2 tables:
T_CUSTOMERS
T_ADDRESSES
SELECT statement
To select the id, fname, and lname columns from T_CUSTOMERS, we use SQL SELECT:
The executed SELECT statement returns rows, which we can retrieve using one of the Cursor fetch functions:
fetchone(): retrieves one row only, and move the Curser to the next row
fetchall(): retrieves all rows, and move the Curser to the end
fetchmany(size): retrieves a number of rows, and move the Curser to the next row
Note: if there are no rows left, fetchone() will return None, whereas fetchall() and fetchmany() will both return empty lists
For example, to retrieve one row only:
row will be a Tuple of values, one for each of the columns in the SELECT statement:
To get each column’s description, the Cursor description attribute can be used:
This description will be a list of Tuples, one for each column. Each Tuple has 7 items to describe the column, the one that matters to us is the first item, which is the column’s name (or alias, if specified in the SQL SELECT):
To retrieve all the remaining rows as a list, we can use fetchall():
UPDATE statement
To update an existing record in T_CUSTOMERS, we use SQL UPDATE with a WHERE clause
Note: if you do not use a WHERE clause in the UPDATE statement to specify which record(s) should be updated, all records will be updated!
Let’s update lname=’Trujillo’ to ‘Taqueria’:
If we want to know how many records were modified by the last SQL statement, we can use the Cursor rowcount attribute. This will return -1 if no SQL has been executed or if the number of rows is unknown:
Let’s check if that record has been updated successfully:
DELETE statement
To delete a customer’s record from T_CUSTOMERS, we use SQL DELETE with a WHERE clause
Note: if you do not use a WHERE clause in the DELETE statement to specify which record(s) should be deleted, all records will be deleted!
Let’s delete the record with id=4:
Let’s check how many records were modified:
Let’s check if that record has been deleted successfully:
INSERT INTO statement
To insert a new record into T_CUSTOMERS, we use SQL INSERT INTO
Let’s insert the same record we have deleted:
Let’s check how many records were modified:
Let’s check if that record has been inserted successfully:
Long SQL statement
Let’s say that we want to select the full name, and city and postal code of a customer with the last name ‘Trujillo’. We (or someone else) has written the following SQL statement to do this:
The easiest way to pass this long SQL statement to excute() is to use the triple-quote string format to encapsulate the statement. White-spaces (including tabs and newlines) should be ignored by the SQL database engine, but if we want to remove them from the left, we can use the dedent() function from the built-in textwrap module:
Finally, we must call commit() on the connection that created this cursor, otherwise our changes will be lost if we close the Script:
5. Reading data into Pandas DataFrame
Pandas can read an SQL statement directly into a dataframe without using a Cursor. This can be done using the read_sql(sql_string, connection) function
0 comments: