You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Using stdin as input and es as output, I am seeing that processing stops and the engine shuts down when a JSON line longer than 16K is encountered. Filtering out all lines greater than 16K from the stdin stream avoids the problem (e.g. piping through "cut -c1-16000"). I see that this concern was already raised in #89. I tried setting property buffer_size to various values but it did not seem to have any effect. I would like to have the lines longer than 16k included in output.
To Reproduce
Create a ndjson file with one line longer than 16K.
Example
{"message":"<some string longer than 16k>","stream":"stdout","@timestamp":"2018-06-11T14:37:30.681701731Z"}
Steps to reproduce the problem:
execute cat your.ndjson | ${FLUENT_BIT_ROOT}/bin/td-agent-bit -i stdin -o es -p Host=127.0.0.1 -p Port=9200 -p Index=yourindex -p Buffer_Size=32k
Expected behavior
Presuming that I'm interpreting the resolution of issue 89 correctly: we should see one record in the target ES index for every line in the input ndjson (which does not exceed the buffer size). And, we should have the ability to configure the buffer size on stdin to allow lines longer than 16k to be processed as input.
Screenshots
Copyright (C) Treasure Data
[2020/05/24 12:39:31] [ info] [storage] version=1.0.3, initializing...
[2020/05/24 12:39:31] [ info] [storage] in-memory
[2020/05/24 12:39:31] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/05/24 12:39:31] [ info] [engine] started (pid=28932)
[2020/05/24 12:39:31] [ info] [sp] stream processor started
[2020/05/24 12:39:42] [ warn] [in_stdin] end of file (stdin closed by remote end)
[2020/05/24 12:39:42] [ info] [input] pausing stdin.0
[2020/05/24 12:39:42] [ warn] [engine] service will stop in 5 seconds
[2020/05/24 12:39:47] [ info] [engine] service stopped
[2020/05/24 12:39:47] [ info] [input] pausing stdin.0
Environment name and version (e.g. Kubernetes? What version?): Running from bash 4.2.46(2)-release
Server type and version: Oracle VM VirtualBox Windows
Operating System and version: Centos 7
Filters and plugins: None
Additional context
I can workaround the problem by cutting out the problematic lines. But in reading the prior issue, I am hoping that this measure is (or will be) unnecessary and that all lines in the input file can be loaded so that the log data is more complete.
The text was updated successfully, but these errors were encountered:
Bug Report
Describe the bug
Using stdin as input and es as output, I am seeing that processing stops and the engine shuts down when a JSON line longer than 16K is encountered. Filtering out all lines greater than 16K from the stdin stream avoids the problem (e.g. piping through "cut -c1-16000"). I see that this concern was already raised in #89. I tried setting property buffer_size to various values but it did not seem to have any effect. I would like to have the lines longer than 16k included in output.
To Reproduce
Create a ndjson file with one line longer than 16K.
Example
{"message":"<some string longer than 16k>","stream":"stdout","@timestamp":"2018-06-11T14:37:30.681701731Z"}
Steps to reproduce the problem:
execute cat your.ndjson | ${FLUENT_BIT_ROOT}/bin/td-agent-bit -i stdin -o es -p Host=127.0.0.1 -p Port=9200 -p Index=yourindex -p Buffer_Size=32k
Expected behavior
Presuming that I'm interpreting the resolution of issue 89 correctly: we should see one record in the target ES index for every line in the input ndjson (which does not exceed the buffer size). And, we should have the ability to configure the buffer size on stdin to allow lines longer than 16k to be processed as input.
Screenshots
Copyright (C) Treasure Data
[2020/05/24 12:39:31] [ info] [storage] version=1.0.3, initializing...
[2020/05/24 12:39:31] [ info] [storage] in-memory
[2020/05/24 12:39:31] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/05/24 12:39:31] [ info] [engine] started (pid=28932)
[2020/05/24 12:39:31] [ info] [sp] stream processor started
[2020/05/24 12:39:42] [ warn] [in_stdin] end of file (stdin closed by remote end)
[2020/05/24 12:39:42] [ info] [input] pausing stdin.0
[2020/05/24 12:39:42] [ warn] [engine] service will stop in 5 seconds
[2020/05/24 12:39:47] [ info] [engine] service stopped
[2020/05/24 12:39:47] [ info] [input] pausing stdin.0
Your Environment
Additional context
I can workaround the problem by cutting out the problematic lines. But in reading the prior issue, I am hoping that this measure is (or will be) unnecessary and that all lines in the input file can be loaded so that the log data is more complete.
The text was updated successfully, but these errors were encountered: