Special Notes
1) This blog post does not only pertain to Blue Coat logs, but possibly other data sources as well.2) This is not a knock on Blue Coat, the app, TA, or any of that, it is just one example of many where we might want to change the way we send data to Splunk. Fortunately Blue Coat provides the means to do so. (hat tip)
Background info
A little while back, we were working on a custom Splunk app that included ingesting Blue Coat logs into a SOC's single pane of glass, but we were getting an error message of:Field extractor name=custom_client_events is unusually slow (max single event time=1146ms)
[custom_client_events]
REGEX = (?<date>[^\s]+)\s+(?<time>[^\s]+)\s+(?<duration>[^\s]+)\s+(?<src_ip>[^\s]+)\s+(?<user>[^\s]+)\s+(?<cs_auth_group>[^\s]+)\s+(?<x_exception_id>[^\s]+)\s+(?<filter_result>[^\s]+)\s+\"(?<category>[^\"]+)\"\s+(?<http_referrer>[^\s]+)\s+(?<status>[^\s]+)\s+(?<action>[^\s]+)\s+(?<http_method>[^\s]+)\s+(?<http_content_type>[^\s]+)\s+(?<cs_uri_scheme>[^\s]+)\s+(?<dest>[^\s]+)\s+(?<uri_port>[^\s]+)\s+(?<uri_path>[^\s]+)\s+(?<uri_query>[^\s]+)\s+(?<uri_extension>[^\s]+)\s+\"(?<http_user_agent>[^\"]+)\"\s+(?<dest_ip>[^\s]+)\s+(?<bytes_in>[^\s]+)\s+(?<bytes_out>[^\s]+)\s+\"*(?<x_virus_id>[^\"]+)\"*\s+\"*(?<x_bluecoat_application_name>[^\"]+)\"*\s+\"*(?<x_bluecoat_application_operation>[^\"]+)
The robustness and volume of data was simply too much for this type of extraction.
Solution
The solution is not to make Splunk adapt, but instead change the way data is sent to it. The Blue Coat app and TA require sending data in the bcreportermain_v1 format--which is an ELFF format. Then the Blue Coat app and TA try to parse this space separated data using the complex regex seen above. Instead of doing that, fortunately you can instruct Blue Coat to send the data in a different format such as key value pair--which Splunk likes and natively parses.In this case, have the Blue Coat admins define a custom log format with the following fields:
Bluecoat|date=$(date)|time=$(time)|duration=$(time-taken)|src_ip=$(c-ip)|user=$(cs-username)|cs_auth_group=$(cs-auth-group)| x_exception_id=$(x-exception-id)|filter_result=$(sc-filter-result)|category=$(cs-categories)|http_referrer=$(cs(Referer))|status=$(sc-status)|action=$(s-action)|http_method=$(cs-method)|http_content_type=$(rs(Content-Type))|cs_uri_scheme=$(cs-uri-scheme)|dest=$(cs-host)| uri_port=$(cs-uri-port)|uri_path=$(cs-uri-path)|uri_query=$(cs-uri-query)|uri_extension=$(cs-uri-extension)|http_user_agent=$(cs(User-Agent))|dest_ip=$(s-ip)|bytes_in=$(sc-bytes)|bytes_out=$(cs-bytes)|x_virus_id=$(x-virus-id)|x_bluecoat_application_name=$(x-bluecoat-application-name)|x_bluecoat_application_operation=$(x-bluecoat-application-operation)|target_ip=$(cs-ip)|proxy_name=$(x-bluecoat-appliance-name)|proxy_ip=$(x-bluecoat-proxy-primary-address)|$(x-bluecoat-special-crlf)
Since this data comes into Splunk as key=value pair now, Splunk parses it natively.
We just removed the TAs from the indexer and replaced it with a simpler props.conf file of this:
[bluecoat:proxysg:customclient]
SHOULD_LINEMERGE = false
This just turns off line merging which is on by default and makes the parsing even faster. Also remember to rename the props.conf and transforms.conf (ex: .bak files) included in the app if you have it installed on your search head--that contains the same complicated regex which will slow down data ingestion. Lastly, by defining your own format, you can add other fields you care about--such as the target IP (cs-ip) which is not included in the default bcreportermain_v1 format for some reason. We hope this helps others that run into this situation.