
Falcon was built using custom tooling and leverages a unique data pipeline that can extract high-quality content out of web data and use it for training a custom codebase, independent from the works of NVIDIA, Microsoft, or HuggingFace.
A particular focus was put on data quality at scale. LLMs are notoriously sensitive to the quality of their training data, so significant care was taken in building a data pipeline that would both scale to tens of thousands of CPU cores for fast processing, a...
More