Learnings from haskell in production

Sreenidhi Nair

# Haskell at ByteAlly * Haskell from the beginning, for last 8 years. * FMap - A transpiler to convert Haskell EDSL to mobile platforms generating human readable code. * FRF - Workflow library that is easy to define & visualize business processes. * TypeQL - A query language for the internet. Today: we're going to focus on TypeQL ----- # TypeQL * Typed query language. * Intended for stitching data from different sources and processing it. * SQL * web apis * CSV * GPU computations. * Being used currently in internal projects. ----- # Actions ## TypeQL usrs = Read DB User {} ## Haskell type Usrs = Read DB User '[] * Translated to haskell type level code to 1. Avoid writing a custom type checker. 2. Ensure everything is statically checked. ----- # Combinators ## TypeQL usrCourses = Product { usrs = Read DB User {} , courses = Api Course {} } { } ## Haskell type UsrCourses = Product '[ "usrs" := Read DB User '[] , "courses" := Api Course '[] ] '[ ] ----- # Filters ## TypeQL usrsAbove20 = Read DB User { FilterBy age :using-hs `>= 20` } ## Haskell type Usrs = Read DB User '[ FilterBy '[S "age"] `Using` >= 20 ] - `:using-{lang}`. Currently haskell is supported, close to adding js. ----- # Validations Powered by reifying and analyzing queries ## Invalid targetting usrs = Read DB User { FilterBy time } ## Filter type mismatch usrs = Read DB User { Avg name } ## Input type mismatch usrsAbove20 = Read DB User { FilterBy age :using-hs `+ 1` } ----- # Reifying query - Generated type query is converted into a richer (type level) AST called tree. - Validations performed on the reified tree. - The reification is done with (closed) type families. type family F a where F Int = Int F a = TypeError ('Text "It isn't an Int") - Custom type errors for emitting domain specific errors. - The reification is also used for editor completions. ----- # Extensibility * Actions - SQL, WebApi, CSV, GPU * Combinators - Joins, And, Or * Filters - FilterBy, GroupBy, Avg, Sum ----- # Modelling ADTs Avoid boilerplate of writing different types differing by a few types. With boilerplate: newtype Age = Age { getAge :: Natural } newtype Department = Dept { getDept :: Text } data User = User { name :: Text , age :: Age , dept :: Department } data UserAvgAge = UserAvgAge { name :: Text , avgAge :: Double , dept :: Department } ----- # Modelling ADTs (2) HList representation, no boilerplate: data HList xs where (:>) :: x -> HList xs -> HList (fld ::: x ': xs) HNil :: HList '[] type User = HList '[ "name" ::: Identity Text , "age" ::: Identity Age , "dept" ::: Identity Department ] type UserAvgAge = HList '[ "name" ::: Identity Text , "age" ::: Identity Double , "dept" ::: Identity Department ] ----- # Filter processing pipeline Two types of filters - Transformational filters - Row filters - Implemented via standard haskell abstractions ----- # Transformational filters Transforms a particular field usrs = Read DB User { Map "age" :using-hs (+ 1) } name age department [ "person1", 20 , IT , "person2", 20 , IT ] => [ "person1", 21 , IT , "person2", 21 , IT ] ----- # Row filters Update on the rows usrAvgAge = Read DB User { Avg "age", GroupBy "dept" , Exclude "name" } name age department [ "person1", 20 , IT , "person2", 20 , IT , "person3", 25 , Sales ] => age department [ 20.0 , IT , 25.0 , Sales ] ----- # Concurrent by default Actions run concurrently usrAndProfs = InnerJoin { usrs = Read DB User {} , profs = Api Profile {} } { input joinOn :with { , profs.userId } :using-hs `(==)` } Thanks to Haskell's lightweight green threads ----- # Concurrent by default (2) * Haxl is a Haskell library that simplifies access to remote data, such as databases or web-based services. * Efficient scheduling of concurrent data accesses {-# LANGUAGE ApplicativeDo #-} usrAndProfs = do usrs <- dataFetch (Read :: Read DB User) apis <- dataFetch (Api :: Api Profile ) innerJoin usrs apis ----- # Downsides - Compilation time / memory consumed with a lot of type level operations is often high. - Error messages, if not handled by custom type errors, can be often very cryptic. - Type level code is syntactically and semantically different from normal haskell value code. - It is often hard to hire Haskell developers. ----- # Takeaways - Prevent bugs at compile time. - Avoid boilerplate of having different types. - Queries run concurrently - thanks to excellent concurrency story of Haskell. - Types can be used to encode domain specific information, and compiler can be used for analysis. ----- # Questions ? ----- # Thank You ----- # Addendum ----- # Row filters - Profunctors to capture the operation class Profunctor p where dimap :: (b -> a) -> (c -> d) -> p a c -> p b d - (->), Star, Fold from foldl are all profunctors. usrAvgAge = Read DB User { FilterBy age :using-hs (> 20) } -- data Star f a b = Star { runStar :: a -> f b } -- Star Maybe models filtering name => Star $ \x -> Just x age => Star $ \x -> if x > 20 then Just x else Nothing dept => Star $ \x -> Just x ----- # Row filters (2) - stitch profunctors together by product profunctor name :: Star Maybe Text Text age :: Star Maybe Age Age dept :: Star Maybe Department Department Star Maybe (HList '["name" ::: Text , "age" ::: Age , "dept" ::: Department]) (HList '["name" ::: Text , "age" ::: Age , "dept" ::: Department]) class Profunctor p => ProductProfunctor p where purePP :: b -> p a b (****) :: p a (b -> c) -> p a b -> p a c ----- # Row filters (3) - proMap to execute the pipeline class (Profunctor pro ) => ProMapping pro f' f | pro f' -> f where proMap :: pro x' x -> f' x' -> f x -----