Dataset readers#

There are a set of classes to conveniently process datasets that exist in files:

These classes allow you to iterate the contents of the file as parsed examples. The reader adapts automatically based off if the Workspace is for single line or multiline inputs.

TextFormatReader#

import vowpal_wabbit_next as vw

workspace = vw.Workspace()

with open("example.txt", "r") as text_file:
    with vw.TextFormatReader(workspace, text_file) as reader:
        for example in reader:
            print(workspace.predict_then_learn_one(example))
0.0
0.14874061942100525
0.008019516244530678
-0.09955745935440063
-0.23391607403755188

DSJsonFormatReader#

import vowpal_wabbit_next as vw

workspace = vw.Workspace(["--cb_explore_adf"])

with open("example.dsjson", "r") as dsjson_file:
    with vw.DSJsonFormatReader(workspace, dsjson_file) as reader:
        for example in reader:
            print(workspace.predict_then_learn_one(example))
[(0, 0.25), (1, 0.25), (2, 0.25), (3, 0.25)]
[(0, 0.25), (1, 0.25), (2, 0.25), (3, 0.25)]
[(0, 0.9624999761581421), (3, 0.012500000186264515), (1, 0.012500000186264515), (2, 0.012500000186264515)]
[(0, 0.9624999761581421), (3, 0.012500000186264515), (1, 0.012500000186264515), (2, 0.012500000186264515)]
[(0, 0.9624999761581421), (3, 0.012500000186264515), (1, 0.012500000186264515), (2, 0.012500000186264515)]

CacheFormatReader#

See Cache format for an example of using the cache format.