Skip to content

Streaming data to XML parser #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
msasikanth opened this issue Jun 4, 2024 · 3 comments
Closed

Streaming data to XML parser #6

msasikanth opened this issue Jun 4, 2024 · 3 comments

Comments

@msasikanth
Copy link

msasikanth commented Jun 4, 2024

Hi, I currently use this library in my RSS reader app called Twine. One of the missing things was the ability to stream data to the parser or parse data in chunks rather than loading the entire string into memory. This was causing OOM crashes when there are large strings

Since I receive response as a ByteReadChannel from the Ktor client, I converted it into CharIterator and parsed data in chunks. So far it works well for me in my testing.

I wanted to share my approach, get your thoughts on it, and leave it to anyone looking to do this.

private fun ByteReadChannel.toCharIterator(
  context: CoroutineContext = EmptyCoroutineContext
): CharIterator {
  return object : CharIterator() {

    private val DEFAULT_BUFFER_SIZE = 1024L

    private var currentIndex = 0
    private var currentBuffer = CharArray(0)

    override fun hasNext(): Boolean {
      if (currentIndex < currentBuffer.size) return true
      if (this@toCharIterator.isClosedForRead) return false

      val packet = runBlocking(context) {
        this@toCharIterator.readRemaining(DEFAULT_BUFFER_SIZE)
      }
      currentBuffer = packet.readText().toCharArray()
      packet.release()
      currentIndex = 0
      return currentBuffer.isNotEmpty()
    }

    override fun nextChar(): Char {
      if (!hasNext()) throw NoSuchElementException()
      return currentBuffer[currentIndex++]
    }
  }
}
@msasikanth
Copy link
Author

Other alternative approach I used was this, but it had parsing issues and failed inconsistently. It seemed like a race condition.

private fun ByteReadChannel.toCharIterator(
  context: CoroutineContext = EmptyCoroutineContext
): CharIterator {
  val channel = this
  return object : CharIterator() {

    private val byteArrayPool = ByteArrayPool
    private var currentIndex = 0
    private var currentBuffer = ""

    private fun refillBuffer() {
      val byteArray = byteArrayPool.borrow()

      runBlocking(context) {
        val bytesRead = channel.readAvailable(byteArray)

        if (bytesRead != -1) {
          currentBuffer = byteArray.commonToUtf8String()
          currentIndex = 0
        }
      }
    }

    override fun hasNext(): Boolean {
      return if (currentIndex == currentBuffer.length) {
        refillBuffer()
        currentIndex < currentBuffer.length
      } else {
        true
      }
    }

    override fun nextChar(): Char {
      if (!hasNext()) throw NoSuchElementException()
      return currentBuffer[currentIndex++]
    }
  }
}

@stefanhaustein
Copy link
Member

stefanhaustein commented Jun 5, 2024

Thanks for sharing!

The shorter snippet (the first variant without refillbuffer) looks cleaner to me, but I don't immediately see the problem with the other variant... ¯\(ツ)

P.S: I have linked this from the main README so it's easily discoverable.

@stefanhaustein
Copy link
Member

P.P.S. Closing this, as it's not really an "open" issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants