Commit graph

74 commits

Author SHA1 Message Date
ceed3ed4bd
delete manual rule overrides and handle it with helm values instead 2024-03-17 12:36:18 +01:00
ae67aa88e9
prometheus/values.yaml: Fix pod antiaffinity 2024-03-16 13:06:47 +01:00
7e20805c34
change grafana volume to RWX 2024-02-23 13:00:56 +01:00
285c0b53d3
fix update strategy 2024-02-23 12:51:54 +01:00
986cdef4d6
set Updatestrategy for grafana (this should fix ) 2024-02-23 12:46:11 +01:00
a181eb3fec
Revert "try to fix prometheus"
This reverts commit d4727c0923.
2024-02-20 20:31:36 +01:00
d4727c0923
try to fix prometheus 2024-02-20 20:26:06 +01:00
8f15467a36
try to fix 3 2024-02-18 06:59:06 +01:00
bce6e8f315
switch to traefik 2 2024-02-18 06:17:03 +01:00
5654baa437
add pod affinity for alertmanager 2024-02-09 04:41:59 +01:00
878a2de21d
change prometheus metric storage values 2024-02-06 22:27:49 +01:00
3c02a16714
Prometheus: remove waiting time for KubeNodeUnreachable Alert 2024-02-01 22:44:57 +01:00
921306dcdc
PROMETHEUS: move alerts to this repo to allow modifications 2024-02-01 22:00:37 +01:00
a2a306c195
add inhibition rule to alertmanager 2024-01-31 16:19:42 +01:00
30b7c96833
add ECC alert (closes ) 2024-01-29 19:28:16 +01:00
0be2949c50
rework storage to reduce backup load 2024-01-26 13:39:13 +01:00
dad89f524c
prometheus/values.yaml: Prevent all replicas on the same node 2023-12-25 10:19:03 +01:00
5a9bb1850e
change alert inhibition rules 2023-12-18 17:33:47 +01:00
4c6bf59f9e
prometheus/values.yaml: avoid all pods on the same node 2023-11-26 20:41:43 +01:00
11f471a711
prometheus/alerts.yaml: increase temperature limit to 90 2023-11-25 18:21:45 +01:00
cf76be1d39
add longhorn monitoring 2023-11-24 20:32:50 +01:00
a441ff630b
Prometheus: change DiskspaceLow Alert 2023-11-23 20:35:46 +01:00
2207baf8e2
fix type error 2023-10-23 18:32:49 +02:00
8c5f6beca7
add label to prometheus namespace 2023-10-23 18:31:47 +02:00
e5cd0a214f
Tell Prometheus to only pick up rules from namespaces with label "prometheus: yolokube" 2023-10-23 18:05:29 +02:00
53be807c0b prometheus/ingress.yaml aktualisiert 2023-09-20 22:15:41 +02:00
d22605c1d9
fix alertmanager 2023-09-15 01:43:41 +02:00
94c2a34aac
try to fix prometheus 2 2023-08-31 00:29:12 +02:00
778306127f
try to fix prometheus
try to fix prometheus 2

try to fix prometheus 3
2023-08-30 22:56:03 +02:00
ffaf6a079e
put alertmanager config back into helm values 2023-08-30 21:27:13 +02:00
69dde5d035
enable persistence for grafana 2023-06-29 12:02:54 +02:00
deba86906d revert memory rule changes (back to 80%)
Signed-off-by: Tom Neuber <tomneuber@web.de>
2023-06-24 18:50:24 +02:00
812cd1efa6
Alerting: edit rules for storage low 2023-06-24 09:56:07 +02:00
78793ed440
Monitoring: change prometheus values to prevent sync-loop in argo 2023-06-24 07:28:58 +02:00
c4033903b4
Monitoring: add node tag to node-exporter metrics 2023-06-23 19:19:48 +02:00
fd6cc7ef3d
add etcdbackup alerts 2023-06-22 19:59:25 +02:00
d75cb6b7b6
change memory rule 2023-06-20 14:41:39 +02:00
e63707d16c
try to fix prometheus deployment 6 (final) (for now) 2023-06-20 13:15:28 +02:00
c706f9b61e
try to fix prometheus deployment 5 2023-06-20 10:08:22 +02:00
23a3a50c3d
try to fix prometheus deployment 4 2023-06-20 09:25:19 +02:00
953ee8e085
try to fix prometheus deployment 3 2023-06-20 09:06:13 +02:00
549cfac957
try to fix prometheus deployment 2 2023-06-20 08:56:06 +02:00
8c065d71ce
try to fix prometheus deployment 2023-06-20 08:54:19 +02:00
d5985f50b5
change prometheus to prometheus-operator with kube-prometheus, this includes grafana 2023-06-20 08:43:48 +02:00
93b8c785a1
add the ingress class to the ingresses to improve compatibility in the early stages of the cluster creation, where the default class is not yet propagated. 2023-06-19 07:15:19 +02:00
79a22afb98
changes to the ingresses 2023-06-19 07:01:14 +02:00
115c128c60
further trim down the ingress ressource to the bare minimum 2023-06-19 00:50:20 +02:00
82294d3cf5
fck nginx-ingress, use loadbalancer for https 2023-06-18 05:18:02 +02:00
03df120bbe
switch prometheus to letsencrypt staging to debug the basicauth ingresses 2023-06-18 04:26:53 +02:00
9972c1598c
reload alertmanager config automatically 2023-06-17 09:11:18 +02:00