Summary of this whole damn article:
curl can spit back data from a remote file. Oracle External Tables can execute curl and get its output as it's data source.
Have .csv/other data in Sharepoint (or any other website) that you need to get into Oracle? Don't download the file and use SQL Loader, link directly to the file over the web and query!
The method below describes how to link to the data file directly (yes, meaning that data, when updated in SP, is instantly reflected 'inside the database'!) from Oracle to Sharepoint using Oracle External Table functionality and the PREPROCESSOR command. In my case, our Sharepoint server is using NTLM authentication.
** Caveat - In my case, the linux curl binaries had not been updated to their most recent version. As such, I was unable to use an option for that program, --netrc-file, to direct the curl program to the proper password file (this is because when the script is run from the external table, it's run as the Oracle user, not as the owner of the script). This means I had to place my credentials for curl in the shell script. Boo! If curl were updated, i could place my credentials in the .netrc file in my shell script owner's home directory and cause that to be used by using "--netrc-file /home/myuser/".
To create an example:
In Sharepoint:
- create a .csv file. In this example I have a 3 column file as such:
col1,col2,col3
data1,data2,data3
- Place the file in Sharepoint. Make sure you have granted proper security for the item. In my case, I gave a service account we use for Active Directory tasks access to the folder/.csv.
- Create a shell script named getspdata.sh in a directory with the following inside the shell script (read above caveat):
#!/bin/bash
/usr/bin/curl --ntlm --netrc-optional -u DOMAIN/your_ad_user:userspassword $1 2>/dev/null
- In that same directory, place a file with a name of "nothing.txt" (trust me here). Nothing needs to be in this file.
- Grant proper security for the parent folder of the items you just made - in my case, the owner of the folder wherein I just placed the shell script is in the same group as the Oracle user. As such, I was able to place rwx rights on the folder for Group. Oracle needs this access for external table preprocessor functionality.
- Create an Oracle directory called SHAREPOINT_SCRIPTS (or your choosing) pointing to the folder where you just made the above shell script.
- Create an External Table that uses the PREPROCESSOR command that points to this directory:
- Notice in the above OPTIONS string, I've properly escaped the url for spaces, etc. This url is the variable $1 that is used in the previously created shell script to feed to curl. Its telling curl where to get the data.
- You should now be able to issue a select against your external table (ex. GET_DATA_FROM_SHAREPOINT_TST) and see the data from the remote .csv file.
- create table urshcema.GET_DATA_FROM_SHAREPOINT_TST
- ( col1 VARCHAR2(4000), col2 VARCHAR2(4000), col3 VARCHAR2(4000) )
- organization external (
- type ORACLE_LOADER
- default directory SHAREPOINT_SCRIPTS
- access parameters (
- RECORDS DELIMITED BY NEWLINE
- SKIP 1
- PREPROCESSOR SHAREPOINT_SCRIPTS: 'getspdata.sh' OPTIONS 'https://sharepoint.yoursite.com/this%20directory/thatdirectory/example_file.csv'
- BADFILE SHAREPOINT_SCRIPTS:'rej.bad'
- NOLOGFILE
- FIELDS TERMINATED BY ','
- OPTIONALLY ENCLOSED BY '"' )
- location (SHAREPOINT_SCRIPTS:'nothing.txt')
- ) reject limit UNLIMITED;
[John K]
ReplyDeleteI talked to Russell a bit regarding this.
You're only using SharePoint as a file-serving host, which could conceivably be anything but it is something we have now and that people could get (or already have) access to. It has a lot of moving parts, but ultimately it isn't in the critical path.
The "linux" portion is basically just a wrapper for curl that deals with the credentials. I might suggest you look at wget. You might be able to encode the username password in the URL (thus use the input-users credentials rather than your own). wget also has a -i - (input from stdin) if you can provide them that way.
It also supports .nitric or .wgetrc files. You'd have to do some testing, but it gives you a second bite at the apple (I believe wget uses curl libraries to do some of the work).
If you use wget, you'll probably want to use the "-O -" to make the downloaded URL go to stdout (which should be what curl is doing if I understand your script properly).
I don't think making live queries across the net (and in particular using sharepoint as a source) is a good thing. Russell says you're just doing this to provide a conversion point, and you'd be taking the data and dumping it into a standard, oracle table and the users would be running queries against that.
If this was happening frequently we'd probably have archive-log related concerns (we'd be generating lots more of them). Updating data inefficiently can be done any number of ways, and the only reason I bring it up here is just because you're making life more convenient and some people might try to kludge something new.
Appreciate the comments John! Thank you very much.
ReplyDeleteI decided to use curl due to this article: http://daniel.haxx.se/docs/curl-vs-wget.html
Wget seems to be much less robust as it primarily only works with http/s and ftp.
Because External Tables using the preprocessor option need to have the url hardcoded in, it would be difficult to allow for the user input you mention.
Right about the way we'd use this. We'd basically conduct a nightly (or so) load of the files and place that data into a standard table that users could then access.
Thanks again for the discussion, i appreciate it much