Skip to main content

User Defined Functions

User Defined Functions (UDFs) are a powerful feature that allows you to blend the flexibility of Python with SQL queries. With UDFs, you can call customized Python functions directly from the Query Analyzer, extending SQL capabilities to match your specific business requirements.

Building Custom UDFs

You can create your own UDFs tailored to your organization's needs:

  1. UDF Bricks — Build custom functions using UDF bricks via Brick Management in the Engine
  2. Python Flexibility — Write Python code that executes within your queries
  3. Reusable Logic — Once created, UDFs are available across all queries and detections
Need Help?

Our Solutions Team can help you design and build custom UDFs that match your specific business requirements. Contact your account representative to get started.


Built-in Examples

The following UDFs are included out-of-the-box to demonstrate common use cases. Use these as inspiration for building your own custom functions.

similarity() — Semantic Text Similarity

Computes semantic similarity between two text strings using sentence transformer models. Returns a score between 0.0 (completely different) and 1.0 (identical meaning).

Syntax:

-- 2 parameters (uses default model)
SELECT similarity(text1, text2);

-- 3 parameters (specify model)
SELECT similarity(text1, text2, model_name);

Available Models:

AliasModel NameLanguagesSpeedQuality
defaultparaphrase-multilingual-MiniLM-L12-v250+⚡⚡⚡ Fast⭐⭐⭐⭐
minilmall-MiniLM-L6-v2English⚡⚡⚡⚡ Fastest⭐⭐⭐⭐
mpnetparaphrase-multilingual-mpnet-base-v250+⚡⚡ Medium⭐⭐⭐⭐⭐

Examples:

-- Basic similarity
SELECT similarity('machine learning', 'deep learning');
-- Result: ~0.85

-- Semantic document search
SELECT
doc_id,
title,
similarity(content, 'suspicious network activity') as score
FROM documents
WHERE similarity(content, 'suspicious network activity') > 0.7
ORDER BY score DESC
LIMIT 10;

-- Duplicate detection
SELECT a.doc_id, b.doc_id,
similarity(a.content, b.content) as sim
FROM documents a
JOIN documents b ON a.doc_id < b.doc_id
WHERE similarity(a.content, b.content) > 0.95
ORDER BY sim DESC;

Score Interpretation:

ScoreMeaningUse Case
0.95+Near identicalDuplicate detection
0.80-0.95Very similarHigh-confidence matches
0.70-0.80SimilarSemantic search threshold
0.60-0.70Somewhat relatedBroad search
Less than 0.60Weakly relatedRarely useful

ipnet_contains() — IP Network Containment

Checks if an IP address is contained within a network range (CIDR notation).

-- Check if IP is in network
SELECT ipnet_contains('192.168.1.50', '192.168.1.0/24');
-- Result: true

-- Filter logs by internal network
SELECT * FROM access_logs
WHERE ipnet_contains(client_ip, '10.0.0.0/8');

-- Find external traffic
SELECT * FROM connections
WHERE NOT ipnet_contains(source_ip, '192.168.0.0/16');

Supports IPv4 and IPv6 addresses with CIDR notation.


network() — Extract Network Address

Extracts the network address from an IP/CIDR string (sets host bits to zero).

SELECT network('192.168.1.50/24');
-- Result: '192.168.1.0'

-- Group by network
SELECT network(ip_address) as subnet, COUNT(*) as count
FROM devices
GROUP BY network(ip_address);

broadcast() — Calculate Broadcast Address

Calculates the broadcast address for a given network (IPv4 only).

SELECT broadcast('192.168.1.0/24');
-- Result: '192.168.1.255'

-- List network ranges
SELECT
network_cidr,
network(network_cidr) as first_ip,
broadcast(network_cidr) as last_ip
FROM subnets;

family() — IP Address Family

Returns the IP address family (4 for IPv4, 6 for IPv6).

SELECT family('192.168.1.1');  -- Result: 4
SELECT family('2001:db8::1'); -- Result: 6

-- Filter by IP version
SELECT * FROM addresses WHERE family(ip) = 4;

json_path() — JSON Path Query

Extracts a single value from JSON using JSONPath syntax.

-- Extract simple field
SELECT json_path('{"name": "Alice", "age": 30}', '$.name');
-- Result: 'Alice'

-- Extract nested field
SELECT json_path('{"user": {"name": "Bob", "city": "NYC"}}', '$.user.city');
-- Result: 'NYC'

-- Parse API responses
SELECT
request_id,
json_path(response_body, '$.status') as status,
json_path(response_body, '$.data.user_id') as user_id
FROM api_logs;

to_timezone() — Timezone Conversion

Converts a timestamp from one timezone to another, handling daylight saving time automatically.

SELECT to_timezone('2024-04-01T00:00:20Z'::TIMESTAMP, 'Europe/Brussels');
-- Result: 2024-04-01 02:00:20 (during DST)

SELECT to_timezone(created_at, 'America/Los_Angeles') as pst_time
FROM events;

Supports all IANA timezone database names (e.g., 'America/New_York', 'Europe/London', 'Asia/Tokyo') and UTC offsets (e.g., '+01:00').


Create Your Own

Ready to build custom UDFs? Navigate to Engine → Brick Management to create UDF bricks with your own Python logic. Your custom functions will be available in the Query Analyzer just like the built-in examples above.