This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Download Microsoft Edge
More info about Internet Explorer and Microsoft Edge
In this article, you learn how to use the Databricks ODBC driver to connect Azure Databricks with Python or R language. Once you establish the connection, you can access the data in Azure Databricks from the Python or R clients. You can also use the clients to further analyze the data.
Prerequisites
You must have an Azure Databricks workspace, a Spark cluster, and sample data associated with your cluster. If you do not already have these prerequisites, complete the quickstart at
Get started
.
Download the Databricks ODBC driver from
Databricks driver download page
. Install the 64-bit version of the driver.
Set up a personal access token in Databricks. For instructions, see
Token management
.
Set up a DSN
A data source name (DSN) contains the information about a specific data source. An ODBC driver needs this DSN to connect to a data source. In this section, you set up a DSN that can be used with the Databricks ODBC driver to connect to Azure Databricks from clients like Python or R.
From the Azure Databricks workspace, navigate to the Databricks cluster.
Under the
Configuration
tab, click the
JDBC/ODBC
tab and copy the values for
Server Hostname
and
HTTP Path
. You need these values to complete the steps in this article.
On your computer, start
ODBC Data Sources
application 64-bit.
Under the
User DSN
tab, click
Add
. In the
Create New Data Source
dialog box, select the
Simba Spark ODBC Driver
, and then click
Finish
.
In the
Simba Spark ODBC Driver
dialog box, provide the following values:
The following table provides information on the values to provide in the dialog box.
Field
Value
Host(s)
Provide the value that you copied from the Databricks workspace for
Server hostname
.
Enter
443
.
Authentication
>
Mechanism
Select
User name and password
.
User name
Enter
token
.
Password
Enter the token value that you copied from the Databricks workspace.
Perform the following additional steps in the DSN setup dialog box.
Click
HTTP Options
. In the dialog box that opens up, paste the value for
HTTP Path
that you copied from Databricks workspace. Click
OK
.
Click
SSL Options
. In the dialog box that opens up, select the
Enable SSL
check box. Click
OK
.
Click
Test
to test the connection to Azure Databricks. Click
OK
to save the configuration.
In the
ODBC Data Source Administrator
dialog box, click
OK
.
You now have your DSN set up. In the next sections, you use this DSN to connect to Azure Databricks from Python or R.
Connect from R
This section provides information on how to integrate an R Studio client running on your desktop with Azure Databricks. For instructions on how to use R Studio on the Azure Databricks cluster itself, see
R Studio on Azure Databricks
.
In this section, you use an R language IDE to reference data available in Azure Databricks. Before you begin, you must have the following installed on the computer.
An IDE for R language. This article uses RStudio for Desktop. You can install it from
R Studio download
.
If you use RStudio for Desktop as your IDE, also install Microsoft R Client from
https://aka.ms/rclient/
.
Open RStudio and do the following steps:
Reference the
RODBC
package. This enables you to connect to Azure Databricks using the DSN you created earlier.
Establish a connection using the DSN.
Run a SQL query on the data in Azure Databricks. In the following snippet,
radio_sample_data
is a table that already exists in Azure Databricks.
Perform some operations on the query to verify the output.
The following code snippet performs these tasks:
# reference the 'RODBC' package
require(RODBC)
# establish a connection using the DSN you created earlier
conn <- odbcConnect("<ENTER DSN NAME HERE>")
# run a SQL query using the connection you created
res <- sqlQuery(conn, "SELECT * FROM radio_sample_data")
# print out the column names in the query output
names(res)
# print out the number of rows in the query output
nrow (res)
Connect from Python
In this section, you use a Python IDE (such as IDLE) to reference data available in Azure Databricks. Before you begin, complete the following prerequisites:
Install Python from here. Installing Python from this link also installs IDLE.
From a command prompt on the computer, install the pyodbc
package. Run the following command:
pip install pyodbc
Open IDLE and do the following steps:
Import the pyodbc
package. This enables you to connect to Azure Databricks using the DSN you created earlier.
Establish a connection using the DSN you created earlier.
Run a SQL query using the connection you created. In the following snippet, radio_sample_data is a table that already exists in Azure Databricks.
Perform operations on the query to verify the output.
The following code snippet performs these tasks:
# import the `pyodbc` package:
import pyodbc
# establish a connection using the DSN you created earlier
conn = pyodbc.connect("DSN=<ENTER DSN NAME HERE>", autocommit=True)
# run a SQL query using the connection you created
cursor = conn.cursor()
cursor.execute("SELECT * FROM radio_sample_data")
# print the rows retrieved by the query.
for row in cursor.fetchall():
print(row)
Next steps
To learn about sources from where you can import data into Azure Databricks, see Data sources for Azure Databricks