CSV Database
1 Connecting to CSV Data Sources
The connector for CSV allows connecting to local and remote CSV resources. Set the URI property to the CSV resource location, in addition to any other properties necessary to connect to your data source.
1.1 Connecting to Local Files
Set the URI to a folder containing CSV files.
Below is an example connection string:
URI=C:\folder1;
You can also connect to multiple CSV files which share the same schema. Below is an example connection string:
URI=C:\folder; AggregateFiles=True;
If you would prefer to expose all of the individual CSV files as tables instead, leave this property False.
URI=C:\folder; AggregateFiles=False;
1.2 Connecting to HTTP CSV Streams
Set the URI to the HTTP or HTTPS URL of the CSV resource you want to access as a table. For example:
URI=http://www.host1.com/streamname1;
To authenticate, set AuthScheme and the corresponding properties. Specify additional headers in CustomHeaders to modify the query string, set CustomUrlParams.
To query the CSV stream, reference streamedtable as the table name.
SELECT * FROM streamedtable
1.3 Connecting to Amazon S3
Set the URI to the bucket and folder. Additionally, set the following properties to authenticate:
AWSAccessKey: Set this to an Amazon Web Services Access Key (a username).
AWSSecretKey: Set this to an Amazon Web Services Secret Key.
For example:
URI=s3://bucket1/folder1; AWSAccessKey=token1;
AWSSecretKey=secret1; AWSRegion=OHIO;
Optionally, specify AWSRegion in addition.
Note: It is also possible to connect to S3-compatible services by specifying its base StorageBaseURL. For example, if the StorageBaseURL conn prp is set to http://s3.%region%.myservice.com and Region is region-1, then we will generate request URLs like https://s3.region-1.myservice.com/bucket/... (or like https://bucket.s3.region-1.myservice.com/..., if the UseVirtualHosting property is true).
1.4 Connecting to Oracle Cloud Object Storage
Set the URI to the bucket and folder. Additionally, set the following properties to authenticate:
AWSAccessKey: Set this to an Oracle cloud Access Key.
AWSSecretKey: Set this to an Oracle cloud Secret Key.
OracleNamespace: Set this to an Oracle cloud namespace.
For example:
URI=os://bucket/remotePath/; AccessKey=token1; SecretKey=secret1;
OracleNamespace=myNamespace; Region=us-ashburn-1;
Optionally, specify Region in addition.
1.5 Connecting to Wasabi
Set the URI to the bucket and folder. Additionally, set the following properties to authenticate:
AWSAccessKey: Set this to a Wasabi Access Key (a username)
AWSSecretKey: Set this to a Wasabi Secret Key.
Optionally, specify AWSRegion in addition.
For example:
URI=wasabi://bucket1/folder1; AWSAccessKey=token1;
AWSSecretKey=secret1; AWSRegion=OHIO;
1.6 Connect to Azure Blob Storage
Set the URI to the name of your container and the name of the blob. Additionally, set the following properties to authenticate:
AzureStorageAccount: Set this to the account associated with the Azure blob.
AzureAccessKey: Set this to the access key associated with the Azure blob.
For example:
URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount;
AzureAccessKey=myKey;
You can also use the OAuth authentication to connect with Azure Blob Storage. For example:
URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount;
AuthScheme=AzureAD; InitiateOAuth=GETANDREFRESH;
If you are connecting from an Azure VM with permissions for Azure Blob storage, you can simply use the AzureMSI AuthScheme For example:
URI=azureblob://mycontainer/myblob/; AzureStorageAccount=myAccount;
AuthScheme=AzureMSI;
If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate.
InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.
AzureTenant: Set this to the tenant you wish to connect to.
OAuthGrantType: Set this to CLIENT.
OAuthClientId: Set this to the Client Id in your app settings.
OAuthJWTCert: Set this to the JWT Certificate store.
OAuthJWTCertType: Set this to the type of the certificate store specified by OAuthJWTCert.
For example:
AuthScheme=AzureServicePrincipal;InitiateOAuth=GETANDREFRESH;
OAuthClientId=MyClientId;;AzureTenant=MyAzureTenant;
OAuthJWTCert=MyOAuthJWTCert;OAuthJWTCertType=PFXFile
1.7 Connect to Azure Data Lake Store Gen 2
Set the URI to the name of the file system and the name of the folder which contacts your CSV files. Additionally, set the following properties to authenticate:
AzureStorageAccount: Set this to the account associated with the Azure data lake store.
AzureAccessKey: Set this to the access key associated with the Azure data lake store.
For example:
URI=abfs://myfilesystem/folder1; AzureStorageAccount=myAccount;
AzureAccessKey=myKey;
URI=abfss://myfilesystem/folder1; AzureStorageAccount=myAccount;
AzureAccessKey=myKey;
You can also use the OAuth authentication to Connect with Azure Data Lake Store Gen 2. For example:
URI=abfss://myfilesystem/folder1; AzureStorageAccount=myAccount;
AuthScheme=AzureAD; InitiateOAuth=GETANDREFRESH;
If you are connecting from an Azure VM with permissions to connect to Azure Data Lake Store Gen 2, you can simply set AuthScheme to AzureMSI. For example:
URI=abfss://myfilesystem/folder1; AzureStorageAccount=myAccount; AuthScheme=AzureMSI;
If you would like to authenticate with a service principal instead of a client secret, it is also possible to authenticate with a client certificate.
InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.
AzureTenant: Set this to the tenant you wish to connect to.
OAuthGrantType: Set this to CLIENT.
OAuthClientId: Set this to the Client Id in your app settings.
OAuthJWTCert: Set this to the JWT Certificate store.
OAuthJWTCertType: Set this to the type of the certificate store specified by OAuthJWTCert.
For example:
AuthScheme=AzureServicePrincipal;InitiateOAuth=GETANDREFRESH;
OAuthClientId=MyClientId;;AzureTenant=MyAzureTenant;
OAuthJWTCert=MyOAuthJWTCert;OAuthJWTCertType=PFXFile
1.8 Connect to Azure File Storage
Set the URI to the name of your azure file share and the name of the resource. Additionally, set the following properties to authenticate:
AzureStorageAccount (Required): Set this to the account associated with the Azure file.
AzureAccessKey: Set this to the access key associated with the Azure file.
AzureSharedAccessSignature: Set this to the shared access signature associated with the Azure file.
For example:
URI=azurefile://fileShare/remotePath/; AzureStorageAccount=myAccount;
AzureAccessKey=myAccessKey;
URI=azurefile://fileShare/remotePath/; AzureStorageAccount=myAccount;
AzureSharedAccessSignature=mySharedSignature;
1.9 Connecting to Box
Set the URI to the path to a folder containing CSV files. To authenticate to Box, use the OAuth authentication standard. See the Box Connector for an authentication guide.
For example:
URI=box://folder1; InitiateOAuth=GETANDREFRESH;
OAuthClientId=oauthclientid1; OAuthClientSecret=oauthcliensecret1;
CallbackUrl=http://localhost:12345;
1.10 Connecting to Dropbox
Set the URI to the path to a folder containing CSV files. To authenticate to Dropbox, use the OAuth authentication standard. See the Dropbox Connector for an authentication guide. You can authenticate with a user account or a service account. In the user account flow, you do not need to set any connection properties for your user credentials, as shown in the connection string below:
URI=dropbox://folder1/; InitiateOAuth=GETANDREFRESH;
OAuthClientId=oauthclientid1; OAuthClientSecret=oauthcliensecret1;
CallbackUrl=http://localhost:12345;
1.11 Connecting to Google Drive
Set the URI to the path to the name of the file system and the name of the folder which contacts your CSV files. To access shared files, set SharedWithMe as the name of the folder which contains your Excel files. For example URI=gdrive://SharedWithMe/remotePath. To authenticate to Google APIs, use the OAuth authentication standard. You can authorize the provider to connect to Google APIs on behalf of individual users or on behalf of a domain. See the Google Drive Connector data source.
For example:
URI=gdrive://folder1;InitiateOAuth=GETANDREFRESH;
1.12 Connecting to SharePoint Online SOAP
Set the URI to a document library containing CSV files. To authenticate, set User and Password and StorageBaseURL.
For example:
URI=sp://Documents/folder1; User=user1; Password=password1;
StorageBaseURL=https://subdomain.sharepoint.com;
Note that this connection method may not work if the StorageBaseURL ends with "-my.sharepoint.com". You should use the onedrive:// scheme when connecting to these sites because they do not support the components that of SharePoint that the provider needs to download files.
1.13 Connecting to SharePoint Online REST
Set the URI to a document library containing CSV files. StorageBaseURL is optional. If not provided, the driver will work with the root drive. To authenticate, use the OAuth authentication standard.
For example:
URI=sp://Documents/folder1; InitiateOAuth=GETANDREFRESH;
StorageBaseURL=https://subdomain.sharepoint.com;
Note that this connection method may not work if the StorageBaseURL ends with "-my.sharepoint.com". You should use the onedrive:// scheme when connecting to these sites because they do not support the components that of SharePoint that the provider needs to download files.
1.14 Connecting to FTP
Set the URI to the address of the server followed by the path to the folder to be used as the root folder. To authenticate, set User and Password.
For example:
URI=ftps://localhost:990/folder1; User=user1; Password=password1;
1.15 Connecting to Google Cloud Storage
Set the URI to the path to the name of the file system and the name of the folder which contacts your CSV files. To authenticate to Google APIs, provide a ProjectId.
For example:
URI=gs://bucket/remotePath/; ProjectId=PROJECT_ID;
2 Securing CSV Connections
By default, the connector attempts to negotiate SSL/TLS by checking the server's certificate against the system's trusted certificate store. To specify another certificate, see the SSLServerCert property for the available formats to do so.
The following are the connection properties for CSVDatabase. Not all properties are required. Enter only property values pertaining to your installation. Several properties will be automatically initialized with the appRules defaults.
Property
Description
Authentication
AuthScheme
The type of authentication to use when connecting to remote services.
AWSAccessKey
Your AWS account access key. This value is accessible from your AWS security credentials page.
AWSRegion
The hosting region for your Amazon Web Services.
AWSRoleARN
The Amazon Resource Name of the role to use when authenticating.
AWSSecretKey
Your AWS account secret key. This value is accessible from your AWS security credentials page.
Password
The password used to authenticate the user.
URL
The URL of the cloud storage service provider.
User
The CSV user account used to authenticate.
AWS Authentication
MFASerialNumber
The serial number of the MFA device if one is being used.
MFAToken
The temporary token available from your MFA device.
Azure Authentication
AzureAccessKey
The storage key associated with your Azure Blob storage account.
AzureAccount
The name of your Azure Blob storage account.
AzureSharedAccessSignature
A shared access key signature that may be used for authentication.
AzureTenant
The Microsoft Online tenant being used to access data. If not specified, your default tentant will be used.
Caching
CacheTolerance
The tolerance for stale data in the cache specified in seconds when using AutoCache .
Connection
OracleNamespace
The Oracle Cloud Object Storage namespace to use.
Region
The hosting region for your S3-like Web Services.
URI
This property specifies a URI for the CSV resource location.
UseVirtualHosting
If true (default), buckets will be referenced in the request using the hosted-style request: http://yourbucket.s3.amazonaws.com/yourobject. If set to false, the bean will use the path-style request: http://s3.amazonaws.com/yourbucket/yourobject. Note that this property will be set to false, in case of an S3 based custom service when the CustomURL is specified.
Data
IgnoreBlankRows
Indicates whether to skip the empty rows.
NullValues
A comma separated list which will be replaced with nulls if there are found in the CSV file.
PushEmptyValuesAsNull
Indicates whether to read the empty values as empty or as null.
Firewall
FirewallPassword
A password used to authenticate to a proxy-based firewall.
FirewallPort
The TCP port for a proxy-based firewall.
FirewallServer
The name or IP address of a proxy-based firewall.
FirewallType
The protocol used by a proxy-based firewall.
FirewallUser
The user name to use to authenticate with a proxy-based firewall.
JWTOAuth
OAuthJWTCert
The JWT Certificate store.
OAuthJWTCertPassword
The password for the OAuth JWT certificate.
OAuthJWTCertSubject
The subject of the OAuth JWT certificate.
OAuthJWTCertType
The type of key store containing the JWT Certificate.
OAuthJWTIssuer
The issuer of the Java Web Token.
OAuthJWTSubject
The user subject for which the application is requesting delegated access.
Kerberos
KerberosKDC
The Kerberos Key Distribution Center (KDC) service used to authenticate the user.
KerberosKeytabFile
The Keytab file containing your pairs of Kerberos principals and encrypted keys.
KerberosRealm
The Kerberos Realm used to authenticate the user with.
KerberosServiceKDC
The Kerberos KDC of the service.
KerberosServiceRealm
The Kerberos realm of the service.
KerberosSPN
The service principal name (SPN) for the Kerberos Domain Controller.
KerberosTicketCache
The full file path to an MIT Kerberos credential cache file.
Logging
Logfile
A filepath which designates the name and location of the log file.
LogModules
Core modules to be included in the log file.
MaxLogFileCount
A string specifying the maximum file count of log files. When the limit is hit, a new log is created in the same folder with the date and time appended to the end and the oldest log file will be deleted.
MaxLogFileSize
A string specifying the maximum size in bytes for a log file (for example, 10 MB). When the limit is hit, a new log is created in the same folder with the date and time appended to the end.
Verbosity
The verbosity level that determines the amount of detail included in the log file.
Misc
AggregateFiles
When set to true, the provider will aggregate all of the files located in the URI directory into a single table called AggregatedFiles .
AzureEnvironment
The Azure Environment to use when establishing a connection.
ConnectionLifeTime
The maximum lifetime of a connection in seconds. Once the time has elapsed, the connection object is disposed.
ConnectionString
***
Culture
This setting can be used to specify culture settings that determine how the provider interprets certain data types that are passed into the provider. For example, setting Culture='de-DE' will output German formats even on an American machine.
CustomHeaders
Other headers as determined by the user (optional).
CustomUrlParams
The custom query string to be included in the request.
DirectoryRetrievalDepth
Limit the subfolders recursively scanned when IncludeSubdirectories is enabled.
ExcludeFileExtensions
Set to true if file extensions should be excluded from table names.
ExtendedProperties
The Microsoft Jet OLE DB 4.0-compatible extended properties for text files.
FMT
The format to be used to parse all text files.
GenerateHiveDDL
Specifies a directory in which the provider will store the DDL statements required to query the data generated by INSERT queries. This is only valid for the S3 target.
GenerateSchemaFiles
Indicates the user preference as to when schemas should be generated and saved.
HDR
Whether to get column names from the first line of the specified files.
IncludeColumnHeaders
Whether to get column names from the first line of the specified files.
IncludeFiles
Comma-separated list of file extensions to include into the set of the files modeled as tables.
IncludeSubdirectories
Whether to read files from nested folders. In the case of a name collision, table names are prefixed by the underscore-separated folder names.
MaxRows
Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time.
MetadataDiscoveryURI
Used together with AggregateFiles , this property specifies a specific file to read the schema of the AggregatedFiles result set.
Other
These hidden properties are used only in specific use cases.
PoolIdleTimeout
The allowed idle time for a connection before it is closed.
PoolMaxSize
The maximum connections in the pool.
PoolMinSize
The minimum number of connections in the pool.
PoolWaitTime
The max seconds to wait for an available connection.
ProjectId
The id of the project where your Google Cloud Storage instance resides.
PseudoColumns
This property indicates whether or not to include pseudo columns as columns to the table.
QuoteCharacter
Determines the character which will be used to quote values.
QuoteEscapeCharacter
Determines the character which will be used to escape quotes.
Readonly
You can use this property to enforce read-only access to CSV from the provider.
RowDelimiter
The character which will be used to detect the end of a CSV row.
RowScanDepth
The number of rows to scan when dynamically determining columns for the table.
SharepointURL
The URL required for the Sharepoint cloud storage service provider.
SimpleUploadLimit
This setting specifies the threshold, in bytes, above which the provider will choose to perform a multipart upload rather than uploading everything in one request.
SkipHeaderComments
If set to true, skips rows at the top of the file beginning with #.
SkipTop
Skips the amount of rows specified starting from the top.
SSLServerCert
The certificate to be accepted from the server when connecting using TLS/SSL.
SupportEnhancedSQL
This property enhances SQL functionality beyond what can be supported through the API directly, by enabling in-memory client-side processing.
Timeout
The value in seconds until the timeout error is thrown, canceling the operation.
TrimSpaces
Set to True if you want the provider to trim preceeding and trailing spaces in a cell containing a quoted value.
TruncateOnInserts
Set to True if you want the provider to truncate on every (batch) insert.
TypeDetectionScheme
Determines how to determine the data types of columns.
UseConnectionPooling
This property enables connection pooling.
UseRowNumbers
Set this to true if you are deleting or updating in CSV and you do not want to specify a custom schema. This will create a new column with the name RowNumber which will be used as key for that table.
UseTempFile
Set to True if you want to use temp files when inserting in a CSV file.
OAuth
AuthKey
The authentication secret used to request and obtain the OAuth Access Token.
AuthToken
The authentication token used to request and obtain the OAuth Access Token.
CallbackURL
The OAuth callback URL to return to when authenticating. This value must match the callback URL you specify in your app settings.
InitiateOAuth
Set this property to initiate the process to obtain or refresh the OAuth access token when you connect.
OAuthAccessToken
The access token for connecting using OAuth.
OAuthAccessTokenSecret
The OAuth access token secret for connecting using OAuth.
OAuthAccessTokenURL
The URL to retrieve the OAuth access token from.
OAuthAuthorizationURL
The authorization URL for the OAuth service.
OAuthClientId
The client ID assigned when you register your application with an OAuth authorization server.
OAuthClientSecret
The client secret assigned when you register your application with an OAuth authorization server.
OAuthExpiresIn
The lifetime in seconds of the OAuth AccessToken.
OAuthGrantType
The grant type for the OAuth flow.
OAuthParams
A comma-separated list of other parameters to submit in the request for the OAuth access token in the format paramname=value.
OAuthRefreshToken
The OAuth refresh token for the corresponding OAuth access token.
OAuthRefreshTokenURL
The URL to refresh the OAuth token from.
OAuthRequestTokenURL
The URL the service provides to retrieve request tokens from. This is required in OAuth 1.0.
OAuthSettingsLocation
The location of the settings file where OAuth values are saved when InitiateOAuth is set to GETANDREFRESH or REFRESH. Alternatively, this can be held in memory by specifying a value starting with memory://.
OAuthTokenTimestamp
The Unix epoch timestamp in milliseconds when the current Access Token was created.
OAuthVerifier
The verifier code returned from the OAuth authorization URL.
OAuthVersion
The version of OAuth being used.
Proxy
ProxyAuthScheme
The authentication type to use to authenticate to the ProxyServer proxy.
ProxyAutoDetect
This indicates whether to use the system proxy settings or not. This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings.
ProxyExceptions
A semicolon separated list of destination hostnames or IPs that are exempt from connecting through the ProxyServer .
ProxyPassword
A password to be used to authenticate to the ProxyServer proxy.
ProxyPort
The TCP port the ProxyServer proxy is running on.
ProxyServer
The hostname or IP address of a proxy to route HTTP traffic through.
ProxySSLType
The SSL type to use when connecting to the ProxyServer proxy.
ProxyUser
A user name to be used to authenticate to the ProxyServer proxy.
Schema
SchemaIniLocation
A path to the directory that contains the schema.ini file.
SFTP
SSLMode
The authentication mechanism to be used when connecting to the FTP or SFTP server.
SSH
SSHAuthMode
The authentication method to be used to log on to an SFTP server.
SSHClientCert
A certificate to be used for authenticating the user.
SSHClientCertPassword
The password of the SSHClientCert certificate if it has one.
SSHClientCertType
The type of SSHClientCert certificate.
SSL
SSLClientCert
The TLS/SSL client certificate store for SSL Client Authentication (2-way SSL).
SSLClientCertPassword
The password for the TLS/SSL client certificate.
SSLClientCertSubject
The subject of the TLS/SSL client certificate.
SSLClientCertType
The type of key store containing the TLS/SSL client certificate.
Last updated