Skip to content

Event Queue and Memcached unhealthy #480

@greycel

Description

@greycel

Hi Team,

  1. Event Queue Issue retrieving a large dataset
    We've been receiving about 2,25,000 alerts per day roughly from different log sources and last checked there were 7,45,000 alerts for the last 7 days. For the last 10-15day the event queue page has been taking time to load and sometimes not loading eventually throwing opensearchpy.exceptions.TransportError: TransportError(503, 'search_phase_execution_exception') below is the error log from the API service.
  • Current setup - AWS Managed Open-Search, Running all reflex services (API, UI, Memcached, Agents) on a single host, would you suggest running agents on a different host..?
  • Any guidance on optimizing the performance of reflex-soar and fine-tuning the handling of such event flow and load time would be very appreciated.
reflex-api  | 2024-03-19 15:10:46,704 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 15:10:46 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 15:11:06,703 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 15:11:06 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 15:11:09,692 - opensearch - WARNING - POST https://vpc-reflexsoar-2ilwlvjslfbp4il4dnta5b7twe.us-east-1.es.amazonaws.com:443/reflex-events/_search [status:503 request:13.398s]
reflex-api  | 2024-03-19 15:11:21,807 - opensearch - WARNING - POST https://vpc-reflexsoar-2ilwlvjslfbp4il4dnta5b7twe.us-east-1.es.amazonaws.com:443/reflex-events/_search [status:503 request:12.115s]
reflex-api  | 2024-03-19 15:11:26,703 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 15:11:26 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 15:11:35,095 - opensearch - WARNING - POST https://vpc-reflexsoar-2ilwlvjslfbp4il4dnta5b7twe.us-east-1.es.amazonaws.com:443/reflex-events/_search [status:503 request:13.287s]
reflex-api  | 2024-03-19 15:11:46,574 - opensearch - WARNING - POST https://vpc-reflexsoar-2ilwlvjslfbp4il4dnta5b7twe.us-east-1.es.amazonaws.com:443/reflex-events/_search [status:503 request:11.479s]
reflex-api  | [2024-03-19 15:11:46 +0000] [142] [ERROR] Exception on /api/v2.0/event [GET]
reflex-api  | Traceback (most recent call last):
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/flask/app.py", line 1523, in full_dispatch_request
reflex-api  |     rv = self.dispatch_request()
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/flask/app.py", line 1509, in dispatch_request
reflex-api  |     return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/flask_restx/api.py", line 404, in wrapper
reflex-api  |     resp = resource(*args, **kwargs)
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/flask/views.py", line 84, in view
reflex-api  |     return current_app.ensure_sync(self.dispatch_request)(*args, **kwargs)
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/flask_restx/resource.py", line 46, in dispatch_request
reflex-api  |     resp = meth(*args, **kwargs)
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/flask_restx/marshalling.py", line 244, in wrapper
reflex-api  |     resp = f(*args, **kwargs)
reflex-api  |   File "/app/api_v2/utils.py", line 223, in wrapper
reflex-api  |     return f(*args, **kwargs, current_user=current_user)
reflex-api  |   File "/app/api_v2/utils.py", line 354, in wrapper
reflex-api  |     return f(*args, **kwargs)
reflex-api  |   File "/app/api_v2/resource/event.py", line 366, in get
reflex-api  |     events = search.execute()
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/opensearch_dsl/search.py", line 721, in execute
reflex-api  |     opensearch.search(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/opensearchpy/client/utils.py", line 177, in _wrapped
reflex-api  |     return func(*args, params=params, headers=headers, **kwargs)
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/opensearchpy/client/__init__.py", line 1593, in search
reflex-api  |     return self.transport.perform_request(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/opensearchpy/transport.py", line 405, in perform_request
reflex-api  |     raise e
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/opensearchpy/transport.py", line 368, in perform_request
reflex-api  |     status, headers_response, data = connection.perform_request(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/opensearchpy/connection/http_urllib3.py", line 275, in perform_request
reflex-api  |     self._raise_error(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/opensearchpy/connection/base.py", line 300, in _raise_error
reflex-api  |     raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
reflex-api  | opensearchpy.exceptions.TransportError: TransportError(503, 'search_phase_execution_exception')
reflex-api  | 2024-03-19 15:20:16,703 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 15:20:16 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 15:27:07,039 - opensearch - WARNING - DELETE https://vpc-reflexsoar-2ilwlvjslfbp4il4dnta5b7twe.us-east-1.es.amazonaws.com:443/reflex-threat-values-0.1.4/_doc/nvv7Vo4BUN9Zj4dfz3MO [status:404 request:0.044s]
reflex-api  | 2024-03-19 15:27:07,040 - ThreatPoller - ERROR - An error occurred while trying to purge expired values. NotFoundError(404, '{"_index":"reflex-threat-values-0.1.4","_id":"nvv7Vo4BUN9Zj4dfz3MO","_version":3,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4885,"_primary_term":1}')
reflex-api  | 2024-03-19 15:27:07,040 - ThreatPoller - ERROR - An error occurred while trying to purge expired values. NotFoundError(404, '{"_index":"reflex-threat-values-0.1.4","_id":"nvv7Vo4BUN9Zj4dfz3MO","_version":3,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4885,"_primary_term":1}')
  1. After restarting all the reflex docker services, the Memcached docker container state is displaying as unhealthy and I've been receiving Memcached problems in the "reflex-api" service logs. service is reachable and connected when checked with "nc localhost 11211". I'm not sure which reflex components will be affected by this, need help.
reflex-api  | 2024-03-19 12:22:11,237 - opensearch - WARNING - POST https://vpc-reflexsoar-2ilwlvjslfbp4il4dnta5b7twe.us-east-1.es.amazonaws.com:443/reflex-expired-tokens/_search [status:N/A request:0.003s]
reflex-api  | Traceback (most recent call last):
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 714, in urlopen
reflex-api  |     httplib_response = self._make_request(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 466, in _make_request
reflex-api  |     six.raise_from(e, None)
reflex-api  |   File "<string>", line 3, in raise_from
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 461, in _make_request
reflex-api  |     httplib_response = conn.getresponse()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 1322, in getresponse
reflex-api  |     response.begin()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 303, in begin
reflex-api  |     version, status, reason = self._read_status()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 272, in _read_status
reflex-api  |     raise RemoteDisconnected("Remote end closed connection without"
reflex-api  | http.client.RemoteDisconnected: Remote end closed connection without response
reflex-api  |
reflex-api  | During handling of the above exception, another exception occurred:
reflex-api  |
reflex-api  | Traceback (most recent call last):
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/opensearchpy/connection/http_urllib3.py", line 249, in perform_request
reflex-api  |     response = self.pool.urlopen(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 798, in urlopen
reflex-api  |     retries = retries.increment(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/util/retry.py", line 525, in increment
reflex-api  |     raise six.reraise(type(error), error, _stacktrace)
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
reflex-api  |     raise value.with_traceback(tb)
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 714, in urlopen
reflex-api  |     httplib_response = self._make_request(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 466, in _make_request
reflex-api  |     six.raise_from(e, None)
reflex-api  |   File "<string>", line 3, in raise_from
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 461, in _make_request
reflex-api  |     httplib_response = conn.getresponse()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 1322, in getresponse
reflex-api  |     response.begin()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 303, in begin
reflex-api  |     version, status, reason = self._read_status()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 272, in _read_status
reflex-api  |     raise RemoteDisconnected("Remote end closed connection without"
reflex-api  | urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
reflex-api  | 2024-03-19 12:22:11,304 - opensearch - WARNING - POST https://vpc-reflexsoar-2ilwlvjslfbp4il4dnta5b7twe.us-east-1.es.amazonaws.com:443/reflex-expired-tokens/_search [status:N/A request:0.003s]
reflex-api  | Traceback (most recent call last):
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 714, in urlopen
reflex-api  |     httplib_response = self._make_request(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 466, in _make_request
reflex-api  |     six.raise_from(e, None)
reflex-api  |   File "<string>", line 3, in raise_from
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 461, in _make_request
reflex-api  |     httplib_response = conn.getresponse()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 1322, in getresponse
reflex-api  |     response.begin()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 303, in begin
reflex-api  |     version, status, reason = self._read_status()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 272, in _read_status
reflex-api  |     raise RemoteDisconnected("Remote end closed connection without"
reflex-api  | http.client.RemoteDisconnected: Remote end closed connection without response
reflex-api  |
reflex-api  | During handling of the above exception, another exception occurred:
reflex-api  |
reflex-api  | Traceback (most recent call last):
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/opensearchpy/connection/http_urllib3.py", line 249, in perform_request
reflex-api  |     response = self.pool.urlopen(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 798, in urlopen
reflex-api  |     retries = retries.increment(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/util/retry.py", line 525, in increment
reflex-api  |     raise six.reraise(type(error), error, _stacktrace)
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
reflex-api  |     raise value.with_traceback(tb)
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 714, in urlopen
reflex-api  |     httplib_response = self._make_request(
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 466, in _make_request
reflex-api  |     six.raise_from(e, None)
reflex-api  |   File "<string>", line 3, in raise_from
reflex-api  |   File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.8/site-packages/urllib3/connectionpool.py", line 461, in _make_request
reflex-api  |     httplib_response = conn.getresponse()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 1322, in getresponse
reflex-api  |     response.begin()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 303, in begin
reflex-api  |     version, status, reason = self._read_status()
reflex-api  |   File "/usr/local/lib/python3.8/http/client.py", line 272, in _read_status
reflex-api  |     raise RemoteDisconnected("Remote end closed connection without"
reflex-api  | urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
reflex-api  | [2024-03-19 12:22:24 +0000] [138] [ERROR] Error checking memcached for event-processing-9d5d9c70-0261-4453-bc87-b078baf39a5c: timed out
reflex-api  | 2024-03-19 12:22:24,710 - app - ERROR - Error checking memcached for event-processing-9d5d9c70-0261-4453-bc87-b078baf39a5c: timed out
reflex-api  | [2024-03-19 12:22:24 +0000] [142] [ERROR] Error checking memcached for event-processing-e3840800-73b4-49c2-94ac-2e4d590c8f3d: timed out
reflex-api  | 2024-03-19 12:22:24,799 - app - ERROR - Error checking memcached for event-processing-e3840800-73b4-49c2-94ac-2e4d590c8f3d: timed out
reflex-api  | [2024-03-19 12:22:44 +0000] [139] [ERROR] Error checking memcached for event-processing-76099e0b-446b-451f-b614-36e0280c03c7: timed out
reflex-api  | 2024-03-19 12:22:44,582 - app - ERROR - Error checking memcached for event-processing-76099e0b-446b-451f-b614-36e0280c03c7: timed out
reflex-api  | [2024-03-19 12:22:44 +0000] [140] [ERROR] Error checking memcached for event-processing-dd4c4f65-3ed8-4342-9677-008cf156d215: timed out
reflex-api  | 2024-03-19 12:22:44,647 - app - ERROR - Error checking memcached for event-processing-dd4c4f65-3ed8-4342-9677-008cf156d215: timed out
reflex-api  | [2024-03-19 12:22:45 +0000] [143] [ERROR] Error checking memcached for event-processing-64dfb364-edb8-4aef-9fde-89395b06a536: timed out
reflex-api  | 2024-03-19 12:22:45,108 - app - ERROR - Error checking memcached for event-processing-64dfb364-edb8-4aef-9fde-89395b06a536: timed out
reflex-api  | [2024-03-19 12:22:45 +0000] [144] [ERROR] Error checking memcached for event-processing-fd9bc87f-97a6-4cb1-b03a-98197b7c921b: timed out
reflex-api  | 2024-03-19 12:22:45,243 - app - ERROR - Error checking memcached for event-processing-fd9bc87f-97a6-4cb1-b03a-98197b7c921b: timed out
reflex-api  | [2024-03-19 12:23:05 +0000] [137] [ERROR] Error checking memcached for event-processing-afca4f5b-c429-4172-bb03-405dbd7feb6a: timed out
reflex-api  | 2024-03-19 12:23:05,799 - app - ERROR - Error checking memcached for event-processing-afca4f5b-c429-4172-bb03-405dbd7feb6a: timed out
reflex-api  | 2024-03-19 12:23:06,704 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 12:23:06 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 12:36:46,703 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 12:36:46 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 12:54:16,703 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 12:54:16 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 12:57:06,705 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 12:57:06 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 13:15:16,704 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 13:15:16 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 13:17:06,953 - opensearch - WARNING - DELETE https://vpc-reflexsoar-2ilwlvjslfbp4il4dnta5b7twe.us-east-1.es.amazonaws.com:443/reflex-threat-values-0.1.4/_doc/V_CHVo4BUN9Zj4df375B [status:404 request:0.030s]
reflex-api  | 2024-03-19 13:17:06,953 - ThreatPoller - ERROR - An error occurred while trying to purge expired values. NotFoundError(404, '{"_index":"reflex-threat-values-0.1.4","_id":"V_CHVo4BUN9Zj4df375B","_version":3,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":5144,"_primary_term":1}')
reflex-api  | 2024-03-19 13:17:06,953 - ThreatPoller - ERROR - An error occurred while trying to purge expired values. NotFoundError(404, '{"_index":"reflex-threat-values-0.1.4","_id":"V_CHVo4BUN9Zj4df375B","_version":3,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":5144,"_primary_term":1}')
reflex-api  | 2024-03-19 13:19:06,703 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 13:19:06 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 13:25:16,703 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 13:25:16 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 13:25:36,703 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 13:25:36 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 13:35:36,703 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 13:35:36 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 13:51:56,704 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 13:51:56 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 13:57:06,703 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 13:57:06 UTC)" skipped: maximum number of running instances reached (1)
reflex-api  | 2024-03-19 14:10:36,704 - apscheduler.scheduler - WARNING - Execution of job "DetectionState.check_state (trigger: interval[0:00:10], next run at: 2024-03-19 14:10:36 UTC)" skipped: maximum number of running instances reached (1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions